HashMap 实现细节(JDK1.8)

mitre

已于 2022-11-09 00:04:25 修改

阅读量190

点赞数

分类专栏： Java 文章标签： java 哈希算法链表

于 2021-11-08 01:57:19 首次发布

本文链接：https://blog.csdn.net/fsdgsddaer/article/details/121193247

版权

Java 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

1 简介

JDK1.8 前后 HashMap 的区别:

对比项	JDK1.8 之前	JDK1.8
节点类型	Entry	Node/TreeNode
存储结构	数组+单向链表	数组+单向链表/红黑树
插入方式	头插法	尾插法
扩容时机	先扩容再插入	先插入再扩容
hash 算法	4次位运算+五次异或	1次位运算+1次异或
插入方式	数组+单向链表	数组+单向链表/红黑树

以下分析均基于 JDK1.8.

1.1 HashMap 的主要成员变量

HashMap 中维护了一个数组 Node[] table,
这个数组中的存放元素的位置称之为 bin,
可以把 bin 理解为 “容器” 或 “桶”.

HashMap 中的主要成员变量:

// Node 数组, HashMap 的核心, 用于存储 key-value.
// 数组的长度总是 2 的 整数次幂
transient Node<K,V>[] table;

// 当前 map 中包含的 key-value 的 个数
transient int size;

// 记录 map 结构修改(structurally modified)的次数
// 结构修改指 修改了 k-v 的数量, 或者改变了内部结构(比如, rehash)
// 此字段用于在 用 迭代器 遍历 map 时, 如果出现并发, 可以 fail-fast.
transient int modCount;

// Node 数组(table) 扩容的阈值, 此阈值为 capacity * loadFactor
// 其中 capacity 是 table 的长度, loadFactor 是 负载因子
int threshold;

// 负载因子 (默认为 0.75)
final float loadFactor;


// table 的容量(table.length) 的默认初始值, MUST be a power of two.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// table 的最大容量, MUST be a power of two <= 1<<30.
static final int MAXIMUM_CAPACITY = 1 << 30;

// 负载因子的默认值
static final float DEFAULT_LOAD_FACTOR = 0.75f;

// 链表长度大于等于 8 时, "可能" 将链表转换成 红黑树 结构
static final int TREEIFY_THRESHOLD = 8;

// 红黑树 中结点数量小于等于 6 时, 将树还原为链表
static final int UNTREEIFY_THRESHOLD = 6;

// 在 bin(就是Node) 进行 treeify 时, table 需要满足的 最小容量, 
// 不大于此最小容量的话, table 会进行 resize 而不是 treeify.
// MIN_TREEIFY_CAPACITY 应该至少为 4 * TREEIFY_THRESHOLD 以避免 resize 和 树化阈值 之间的冲突.
// 当 table.length >= MIN_TREEIFY_CAPACITY 并且 链表长度 大于等于 8 时, 才把 链表 转成 红黑树
static final int MIN_TREEIFY_CAPACITY = 64;

1.2 HashMap 的内部类

HashMap 的内部类 Node:

    /**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        // 省略 构造函数, getter, setter, equals, hashCode, toString
}

容易看出, Node 是一个单向链表结构.

HashMap 的内部类 TreeNode:

    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        // 省略 所有方法
}

LinkedHashMap.Entry 的结构:

    /**
     * HashMap.Node subclass for normal LinkedHashMap entries.
     */
    static class Entry<K,V> extends HashMap.Node<K,V> {
        Entry<K,V> before, after;
        Entry(int hash, K key, V value, Node<K,V> next) {
            super(hash, key, value, next);
        }
    }

可以看到 LinkedHashMap.Entry 是继承了 HashMap.Node 的. 所以, TreeNode 也是 Node 的子类.

2 hash 算法

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     * 
     * ==以下是我蹩脚的翻译, 若对 "扰动函数" 了解的话, 很好理解这段话. 若不了解, 请看下文==
     * 计算 key.hashCode() 并把 hash 的 高位(高16位) 和 低位(低16位) 做 异或运算(这个结果只影响了key.hashCode()的低16位).
     * 因为 table 使用 2的n次幂 做掩码, 仅 高位 变化 的 hash 在掩码的作用下总是会碰撞.
     * 所以我们做了一个转换, 将 高位的影响 向下传播. 
     * 这是在位传播的速度, 效用和质量之间的一种权衡.
     * 因为许多常见的散列集已经合理地分布(所以不能从传播中受益), 
     * 而且因为我们使用树来处理 bin 中的大规模冲突, 
     * 所以我们只是 通过 异或 一些移位, 以 代价最小的方式 降低系统的消耗,
     * 同时 合并 最高位(指高16位) 的影响. 否则 由于 table 的边界, 最高位将永远不会用于索引计算.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

找到一个元素在数组中的索引, n 为 table 的长度:
i = (n - 1) & hash

因为 HashMap 中的数组长度是 2 的整数次幂, 所以 n-1 结果总是高位全是0, 低位全是1 (结果是这个样子的 000...0111...111).

题外话: 对位运算不了解, 可以参考这篇blog.

看个示例^[1], 假设 table.size = 16, 有 A 和 B 两个元素(对应的hash 分别为 H1 和 H2), 如果直接使用 Object.hashCode() 的话, 会出现碰撞:

H1: 00000000 00000000 00000000 00000101
H2: 00000000 11111111 00000000 00000101

// Hash 碰撞示例：
index1 =  H1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 = 5
index2 =  H2 & (n - 1) = 00000000 11111111 00000000 00000101 & 1111 = 0101 = 5

但是, 如果用高16位对低16位进行"扰动"一下, 就不会碰撞了:

00000000 00000000 00000000 00000101 // H1
00000000 00000000 00000000 00000000 // H1 >>> 16
00000000 00000000 00000000 00000101 // hash1 = H1 ^ (H1 >>> 16)

00000000 11111111 00000000 00000101 // H2
00000000 00000000 00000000 11111111 // H2 >>> 16
00000000 11111111 00000000 11111010 // hash2 = H2 ^ (H2 >>> 16)

// 没有 Hash 碰撞 
index1 = hash1 & (n - 1) = 00000000 00000000 00000000 00000101 & 1111 = 0101 =  5
index2 = hash2 & (n - 1) = 00000000 11111111 00000000 11111010 & 1111 = 1010 = 10

总结:
hash(key) 用于获取 key 的 hash 值, 其中对低16位进行 “扰动” 以增加 hash 结果的均衡性.

2 put 方法插入元素

    /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     */
    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
                   
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        
        // 赋值: tab = table, n = tab.length 
        // 如果 table 是空的, 则需要初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length; // 注1, 扩容
            
        // 如果 计算出的 位置i 上没有元素
        if ((p = tab[i = (n - 1) & hash]) == null)
            // 把key-value封装为Node, 放在 位置i 上
            tab[i] = newNode(hash, key, value, null); 
        // 发生了 哈希冲突: 计算出的 位置i 上已经有 Node p 了
        else {
            Node<K,V> e; K k;
            // 如果 p 和 当前插入的元素 相同(hash值相同, key也相同). 即 找到 p 是要被覆盖的Node
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p; // 赋值: 把 位置i 上 已存在的 Node p 赋值给 e
            // 如果 p 是 红黑树
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); // 注2
            else {
                // 遍历单链表, 因 binCount 从0开始, 所以单链表的长度为 binCount-1
                for (int binCount = 0; ; ++binCount) {
                    // 如果 p 是链表的最后一个Node
                    if ((e = p.next) == null) {
                        // 把key-value封装为Node, 追加到链表尾部
                        p.next = newNode(hash, key, value, null);
                        // 如果追加结点后, 链表长度 大于等于 8
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash); // 注3, 可能是resize, 也可能是treeify
                        break;
                    }
                    // 找到 e 是要被覆盖的 Node
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // 如果 e 不是null, 说明有需要覆盖的节点
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                // 空的回调函数, 在 LinkedHashMap 有重写
                afterNodeAccess(e);
                return oldValue;
            }
        }
        // 执行到了这里, 说明插入了一个新的节点
        ++modCount;
        // 更新map 的 size后, 并判断是否需要扩容
        if (++size > threshold)
            resize();
        // 空的回调函数, 在 LinkedHashMap 有重写
        afterNodeInsertion(evict);
        return null;
    }

    // Create a regular (non-tree) node
    // 把 key-value 封装为一个普通的 Node
    Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
        return new Node<>(hash, key, value, next);
    }

HashMap.put(key, value) 插入一个 mappingX(key-value, 假设此key计算出来的index 为 i) 的简化过程:

table 为空, 则进行 resize(), 把 mappingX 封装为 Node, 插入到 table[i] 上, 修改 Map.modCount 和 Map.size 属性后返回 null.
如果 table[i] 为空, 把 mappingX 封装为 Node, 插入到 table[i] 上, 修改 Map.modCount 和 Map.size 属性后返回 null.
如果 table[i] 上已经有一个 Node p, p 在逻辑上等于 mappingX(hash值相同, key 也相等), 把 mappingX 封装为 Node, 替换 p, 返回旧值 p.value.
如果 table[i] 上已经有一个 TreeNode p, 把 mappingX 通过 putTreeVal 插入树中. 如果存在逻辑上相等的结点 p, 替换 p, 返回旧值 p.value; 如果不存在, 返回 null.
遍历 table[i] 上的链表, 如果找到逻辑上相等的结点 p, 把 mappingX 封装为 Node, 替换 p, 返回旧值 p.value; 如果不存在, 把 mappingX 封装为 Node, 插入 链表尾部, 如果插入后需要扩容就 resize(), 返回 null.

3 resize 方法扩容 table

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * 初始化 或 把 table 容量扩大 2 倍.
     * 如果 table 为空, 分配初始容量. 否则, 因为使用 2 的整数次幂扩展(大), 
     * 每个 bin 中的元素 要么 待在原来的 index 上, 要么 移动到 新的扩展的偏移量上.
     */
    final Node<K,V>[] resize() {
        // 赋值: oldTab 赋值为 扩容前的 table 
        Node<K,V>[] oldTab = table;
        // 赋值: oldCap 赋值为 扩容前的 table.length
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // 赋值: oldThr 赋值为 扩容前的 threshold
        int oldThr = threshold;
        int newCap, newThr = 0;
        // oldCap > 0, 说明 table 已经被初始化过
        if (oldCap > 0) {
            // 如果当前容量已经到达上限
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                // 返回当前的 table, 不进行扩容
                return oldTab;
            }
            // 赋值: newCap 赋值为 oldCap*2, 即 2 倍的当前容量
            // 如果 newCap < 最大容量限制, 且 oldCap >= 初始容量16
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                // 赋值: newThr 赋值为 oldThr*2, 即 2 倍的当前阈值
                newThr = oldThr << 1; // double threshold
        }
        //如果当前表是空的, 但是有阈值. 代表是初始化时指定了容量, 阈值
        else if (oldThr > 0) // initial capacity was placed in threshold
            // 赋值: 新表的容量 赋值为 旧的阈值
            newCap = oldThr;
        // 当前table为空, 且没有阈值.  
        else {               // zero initial threshold signifies using defaults
            // 赋值: table 容量 赋值为 默认的 16
            newCap = DEFAULT_INITIAL_CAPACITY;
            // 赋值: 新的阈值 赋值为 默认加载因子0.75f * 默认容量16 = 12
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // 如果新的阈值是0, 表明: 当前 table 是空的, 但是有阈值
        if (newThr == 0) {
            // 根据新表容量 和 加载因子 求出 新阈值(扩容后的阈值)
            float ft = (float)newCap * loadFactor;
            // 越界修复
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // 更新 threshold
        threshold = newThr;
        //根据新的容量, 构建新的 Node 数组 newTab
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // 更新 table 的引用
        table = newTab;
        // 如果 oldTab 中有元素, 则需要将 oldTab 中的元素转移到 newTab 中
        if (oldTab != null) {
            // 遍历 oldCap
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // 赋值: e 赋值为 当前遍历到的 oldTab[j]
                if ((e = oldTab[j]) != null) {
                    // 把 oldTab[j] 这个 bin 置空, 方便GC
                    oldTab[j] = null;
                    // 如果当前链表中就一个元素(没有发生哈希碰撞)
                    if (e.next == null)
                        // 将元素放置在 newTab 中
                        newTab[e.hash & (newCap - 1)] = e;
                    // 如果发生了哈希碰撞, 而且 Node 已经被转换为了 TreeNode
                    else if (e instanceof TreeNode)
                        // 暂且不谈
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // 如果发生过哈希碰撞, 且节点数小于8个(bin 是链表结构)
                    else { // preserve order
                        // 因为扩容是容量翻倍, 
                        // 所以原链表上的每个节点, 
                        // 可能在原来的下标, 即low位; 
                        // 也可能在扩容后的下标, 即high位.
                        // high位 = low位 + oldTab.length
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // 等于0表示: rehash 后下标小于oldCap, 应该存放在低位
                            // 否则应该存放在高位
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        // 循环直到链表结束
                        } while ((e = next) != null);

                        // 将低位链表存放在原index处
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // 将高位链表存放在新index处
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

注意:
resize() 在对 table 进行扩容时, 是把 table.length 扩大到原来的 2 倍, 体现在 2 进制上是 左移1位.
比如: table.length=16, 扩容后变成 32.
二进制表示为:

扩容前(16): 0000 1000
扩容后(32): 0001 0000

4 treeifyBin 方法: 单链表转成红黑树

    /**
     * Replaces all linked nodes in bin at index for given hash unless
     * table is too small, in which case resizes instead.
     * 替换一个给定索引处 bin 中的 单链表的所有结点(替换为红黑树).
     * 如果 table 很小的话, 则进行扩容, 不进行替换.
     */
    final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        // tab 很小, 只进行扩容
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
            resize();
        // 赋值: index 赋值为 由当前hash值计算出的索引
        // 赋值: e 赋值为 tab[index]
        // 判断: e != null, 即 当前 hash 值对应的 bin 位置上不为空
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            // hd 用于存储 下面 do-while 构造的双链表的 head, tl 用于存储双链表的 tail
            TreeNode<K,V> hd = null, tl = null;
            // 循环遍历单链表, 把 Node 封装为 TreeNode, 并构造成双链表结构, 为转红黑树做准备
            do {
                // 把 Node e 封装为 TreeNode
                TreeNode<K,V> p = replacementTreeNode(e, null);
                if (tl == null)
                    hd = p;
                else {
                    p.prev = tl;
                    tl.next = p;
                }
                tl = p;
            } while ((e = e.next) != null);
            // 赋值: tab[index] = hd
            // hd != null, 说明需要转红黑树, 而且此时 hd 是 由 TreeNode 构成的 双向链表的 head
            if ((tab[index] = hd) != null)
                // 双向链表 转 红黑树
                hd.treeify(tab);
        }
    }

    // For treeifyBin
    TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
        return new TreeNode<>(p.hash, p.key, p.value, next);
    }

treeify 方法是 TreeNode 的一个实例成员方法:

       /**
         * Forms tree of the nodes linked from this node.
         */
        final void treeify(Node<K,V>[] tab) {
            TreeNode<K,V> root = null;
            // this 为调用此方法的 TreeNode 实例对象
            // 遍历 this 指向的 双链表机构, 构造 红黑树
            for (TreeNode<K,V> x = this, next; x != null; x = next) {
                next = (TreeNode<K,V>)x.next;
                x.left = x.right = null;
                // root 结点为空(还没有构造root), 则 构造 root
                if (root == null) {
                    x.parent = null;
                    // 根结点为黑色
                    x.red = false; 
                    root = x;
                }
                // 已经构造了 root 结点, 下面构造其他 子孙结点
                else {
                    K k = x.key;
                    int h = x.hash;
                    Class<?> kc = null;
                    // 死循环, 一直到 把 x 加入到 红黑树结构中退出
                    for (TreeNode<K,V> p = root;;) {
                        int dir, ph;
                        K pk = p.key;
                        // 如果x节点的hash值小于p节点的hash值
                        if ((ph = p.hash) > h)
                            // 将dir赋值为-1, 代表向p的左边查找
                            dir = -1;
                        // x节点的hash值大于p节点的hash值
                        else if (ph < h)
                            // 将dir赋值为1, 代表向p的右边查找
                            dir = 1;
                        // x的 hash值 和 p的 hash值 相等, 则比较key值, 比较细节略过
                        else if ((kc == null &&
                                  (kc = comparableClassFor(k)) == null) ||
                                 (dir = compareComparables(kc, k, pk)) == 0)
                            dir = tieBreakOrder(k, pk);

                        TreeNode<K,V> xp = p;
                        // 赋值: p 赋值为 p.left 或 p.right.
                        // 如果 p == null, 怎说明找到了一个 放置 x 的位置
                        if ((p = (dir <= 0) ? p.left : p.right) == null) {
                            x.parent = xp;
                            if (dir <= 0)
                                xp.left = x;
                            else
                                xp.right = x;
                            // x 插入红黑树结构中后, 调整使红黑树继续满足红黑树的定义
                            root = balanceInsertion(root, x);
                            break;
                        }
                    }
                }
            }
            // Ensures that the given root is the first node of its bin
            moveRootToFront(tab, root);
        }

引用

[1]. 详解 HashMap 中的 Hash 算法（扰动函数）
[2]. (九)深入并发编程之并发容器：阻塞队列、写时复制容器、锁分段容器原理详谈
[3]. 面试必备：HashMap源码解析（JDK8）
[4]. 深入理解HashMap原理(一)——HashMap源码解析(JDK 1.8)
[5]. java.util.HashMap(Java 8)

mitre

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap 实现细节(JDK1.8)

1 简介Java 8前后 HashMap 的区别:对比项Java 8 之前Java 8 之后(含)节点类型EntryNode/TreeNode存储结构数组+单向链表数组+单向链表/红黑树插入方式头插法尾插法扩容时机先扩容再插入先插入再扩容hash 算法4次位运算+五次异或1次位运算+1次异或插入方式数组+单向链表数组+单向链表/红黑树以下分析均基于 Java 8.1.1 HashMap 的主要成员变量HashMap 中
复制链接

扫一扫

专栏目录