HashMap基本原理

最新推荐文章于 2023-09-01 06:02:01 发布

北绿蚁

最新推荐文章于 2023-09-01 06:02:01 发布

阅读量165

点赞数

分类专栏：数据结构文章标签： java 数据结构

本文链接：https://blog.csdn.net/weixin_44691915/article/details/121136434

版权

数据结构专栏收录该内容

4 篇文章 0 订阅

订阅专栏

一、简介

HashMap 是一个散列表，存储键值对 (key-value) 的映射。
HashMap 根据键的 hashCode() 值来存储，访问速度会很快。
允许 key，value 值为null（key 只能有一个为 null，value 可以存在多个null）。
HashMap 线程不安全。
HashMap 是无序的，即不记录插入的顺序。
HashMap 继承于AbstractMap，实现了 Map、Cloneable、java.io.Serializable 接口。

图1 HashMap的实现与继承

二、基本原理

HashMap 的实现在 JDK1.7 是数组+链表，到了 JDK1.8 后使用数组+链表+红黑树。

图2 HashMap的存储结构

HashMap 的数组table 长度是有限的，使用 hash函数计算出来的下标可能相同（即发生冲突），这时使用链表来解决冲突（拉链法）。每个数组元素都是链表的头节点，当发生冲突的时候，新来的Node<K,V>会映射到链表对应的位置。

举个栗子，我们new 一个 HashMap，然后插入一些值，然后进行 debug 来查看里面存储了什么。

HashMap<String, Integer> map = new HashMap<String, Integer>();
        map.put("语文", 1);
        map.put("数学", 2);
        map.put("英语", 3);
        map.put("历史", 4);
        map.put("政治", 5);
        map.put("地理", 6);
        map.put("生物", 7);
        map.put("化学", 8);

        for (Map.Entry<String, Integer> entry : map.entrySet()) {
            System.out.println(entry.getKey() + ":" + entry.getValue());
        }
        System.out.println("---");
    }

如图3所示，当最后插入map.put("化学", 8);时，与前面的元素发生了冲突，则查看该节点的 next 节点是否为 null，是的话就直接插入。

JDK1.7 使用的是头插法，即插在链表的头部，因为 HashMap 的设计者认为新插入的 Node 节点被查询的可能性更高。
JDK1.8 使用的是尾插法，是为了避免头插法带来的死循环问题。且用了红黑树后只能使用尾插法。

图3 HashMap存储Node

modCount：记录 HashMap 内部结构发生变化的次数，主要用于迭代的快速失败（fast-fail）。需要注意的是，内部结构发生变化指的是结构发生变化，例如 put 新键值对时，++modCount ，但是如果是某个 key对应的 value 值被覆盖，不属于结构变化（源码 putVal 方法中是直接返回旧value 值），所以 modCount 的值不发生变化。
MIN_TREEIFY_CAPACITY：链表转化为红黑树时，数组 table 长度要大于最小树化容量（64）。如果表长小于64，则执行 resize() 对数组 table 扩容。

当链表的长度等于链表转树的阈值TREEIFY_THRESHOLD（8） 时，且表长不小于 MIN_TREEIFY_CAPACITY（64），链表会转化为红黑树，以提升查询效率。

图4 HashMap链表转红黑树

那么这些节点的如何通过 hash函数来计算出下标的呢？

首先，调用了 hashCode() 函数，然后将其值的高16位和低16位进行了异或(XOR)

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

这个 hash() 函数的大概意思是，key 调用了 hashCode() 函数得到值h 后，保持高16bit不变，然后对高16bit和低16bit 进行异或(XOR)，从而得到 hash值。

然后我们根据 hash值和数组表长减1（n-1）进行与运算，得出数组下标。

图5 计算数组下标

值得思考的是，为什么 HashMap 的数组容量（n）是 2 的次幂呢？
( n - 1 ) 通过二进制表示，其尾端都是以 1 表示（0000 1111）。
当 ( n - 1 ) & hash 时，保留的是 hash 值二进制位中的 1（如上图图5）。
这样刚好满足，( n - 1 ) & hash = hash % n。

使用 & 运算有以下优势。

& 运算比 % 取模运算速度快；
能保证索引值肯定在数组容量 capacity 中，不会超出数组长度。

如果我们给定的初始数组容量 initialCapacity 不是 2 次幂呢？
HashMap 在构造的时候会调用 tableSizeFor(initialCapacity) 来返回给定目标容量的 2 次幂。

 /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

三、put 的具体实现

putVal 方法为主要执行方法，其源码如下所示。

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /** 
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab;  // 存放数组table	
        Node<K,V> p;   // 存放节点
        int n, i; // n 存放表长; i 存放key经过hash计算后的得到的数组下标
        // 判断table是否已初始化，否则进行初始化table操作
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 1.若p节点该位置中无元素。
        // 根据hash值 确定 待插入节点在数组中的插入的位置 p节点，即计算索引存储的位置;
        // 若 p中无元素则直接进行插入
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            // 2.如果待插入位置节点p中存在元素
            Node<K,V> e; K k;
            // 对比p节点中存在的元素 与待插入元素的hash值和key值，执行赋值操作（key已经存在，直接覆盖value）
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 3. 判断该元素是否为红黑树节点
            else if (p instanceof TreeNode)
                // 红黑树节点则调用putTreeVal()函数进行插入
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // 4. 若该元素是链表，且为链表头节点，则从此节点开始向后寻找合适的插入位置
                for (int binCount = 0; ; ++binCount) {
                	// 4.1 循环遍历链表，直到找到节点为null 的位置，然后插入，退出循环
                    if ((e = p.next) == null) { 
                        // 找到插入位置后，新建节点插入
                        p.next = newNode(hash, key, value, null);
                        // 注意：若链表上节点超过TREEIFY_THRESHOLD - 1，即链表长度为8，将链表转变为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            // 这里会判断数组长度是否小于MIN_TREEIFY_CAPACITY（64），小于则扩容resize()
                            treeifyBin(tab, hash); 
                        break;
                    }
                    // 4.2 循环遍历链表，若该节点中元素的key已存在（key已经存在，直接覆盖value），退出循环
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e; // 当前位置节点
                }
            }
            // key值已经存在（执行的覆盖操作），返回旧value值
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                // 若节点的key已存在，则返回旧value值
                return oldValue;
            }
        }
        // 记录修改次数（注意：如果只是覆盖value，则直接return 了 oldValue）
        ++modCount;
        // 判断HashMap中存在的元素个数 size 是否 超过 扩容阈值threshold，超过则扩容数组taable。
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

现在，我们将putVal方法执行的流程视图化，以便理解与回顾。

图6 putVal 方法执行流程

四、get 的具体实现

public V get(Object key) {
     Node<K,V> e;
     return (e = getNode(hash(key), key)) == null ? null : e.value;
 }

/**
 * Implements Map.get and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 1.第一个节点就找到
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 下一个节点是否存在
        if ((e = first.next) != null) {
        	// 2.如果该节点为 TreeNode 类型，则通过 getTreeNode() 方法去获取
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 3.遍历链表中查找
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

图7 get 方法执行流程

五、resize 的具体实现

/**
* Initializes or doubles table size.  If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
*  初始化或加倍表大小。如果为空，则根据字段阈值中持有的初始容量目标进行分配。
*  否则，因为我们使用的是 2 的幂扩展，所以每个 bin 中的元素必须保持相同的索引，或者在新表中以 2 的幂的偏移量移动。
*
* @return the table
*/
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;  // 定义新的表容量、扩容阈值
    // 1.若旧数组容量不为空
    if (oldCap > 0) {
         // 1.1 若旧数组容量大于等于最大容量值，扩大扩容阈值至最大值，【并返回旧数组】
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 1.2 若旧数组扩容为原来的2倍后小于最大容量值，且旧数组容量大于等于默认初始值容量，则扩容阈值也变为原来的2倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    // 2.若旧扩容阈值大于 0（此时 oldCap = 0，即旧数组为空），则将旧扩容阈值 赋值给 新的数组容量
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    // 3. 否则（扩容阈值，数组容量都为 0），新数组容量、新扩容阈值使用默认值
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // 走了第 2 步时，newThr 为 0，需要进行计算赋值
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        // 新建一个新的数组table
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // 若旧数组不为空，把旧数组oldTab的节点 映射到 新数组newTab中
    if (oldTab != null) {
        // 循环遍历旧数组oldTab
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e; // 当前节点
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                // 若是单个节点，即没有后继next节点，则直接在newTab在进行重定位
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                // 若节点为TreeNode，则需要对红黑树进行拆分
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // 否则节点为链表；链表重组并保持原有顺序
                else { // preserve order
                	// low，低位头、尾节点；记录链表中位置不变的节点
                    Node<K,V> loHead = null, loTail = null;
                    // high，高位头、尾节点；记录链表中位置改变的节点；扩容后容量翻倍，high = low + oldCap
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next; // 存放 e 的下一个节点
                    do {
                        next = e.next;
                        // 若为原位置（如下图8所示）
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // 若为新位置
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        } // 循环直到当前数组下标下的链表遍历完
                    } while ((e = next) != null);
                    // 将低位的链表放到新的数组中，位置不变
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 将高位的链表放到新的数组中，索引位置为 （原位置 + oldCap）
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

源码中使用 e.hash & oldCap) == 0 来判断扩容后节点的索引位置是否发生改变，其详细如下图8所示。

图8 判断扩容后节点位置是否变化

图9 扩容后节点索引位置变化

图10 HashMap中数组16扩容至32

六、总结

HashMap的初始表长initialCapacity默认为16；
负载因子loadFactor默认值为0.75；
threshold 扩容阀值（threshold = capacity*loadFactor）；
size为HashMap中存在节点的个数；
链表节点数达到树化阈值TREEIFY_THRESHOLD(8) 且表长不小于最小树化容量MIN_TREEIFY_CAPACITY(64)时，转化为红黑树，否则执行 resize() 方法进行扩容，且扩容为之前数组容量的两倍，所以指定一个合适的初始值能减少扩容次数；
JDK1.7采用数组+链表，JDK1.8 采用数组+链表+红黑树；
JDK1.7采用头插法，JDK1.8 采用尾插法（解决了头插法造成的死循环问题）；
modCount 是用来记录HashMap内部结构发生变化的次数，put方法覆盖HashMap中的某个key对应的value不属于结构变化；
HashMap 是非线程安全的，非并发场景使用HashMap，并发场景可以使用Hashtable，但是推荐使用ConcurrentHashMap（锁粒度更低、效率更高）。