HashMap 源码解析

最新推荐文章于 2022-07-29 14:15:09 发布

阿瓦达啃小瓜

最新推荐文章于 2022-07-29 14:15:09 发布

阅读量159

点赞数

分类专栏：随笔 jdk 文章标签： java

本文链接：https://blog.csdn.net/hermi0ne/article/details/115626288

版权

随笔同时被 2 个专栏收录

11 篇文章 1 订阅

订阅专栏

jdk

1 篇文章 0 订阅

订阅专栏

1. HashMap的初始化

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

HashMap 中的 Node 数组的初始大小。实例化一个 HashMap 对象后，Node数组为null，只有在第一次往 HashMap 中put数据时，Node数组才会初始化，初始化的大小就是 DEFAULT_INITIAL_CAPACITY。

transient Node<K,V>[] table;						// Node数组

public HashMap() {
    // 构造函数中并没有初始化table数组
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0) {
        // 当 table == null 的时候，tab = resize()，也就是
        n = (tab = resize()).length;
    }
    // ...
}


final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // ..
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        // ..
    else {      
        // 因为i刚开始 table == null && threshold == 0，所以走的是这里的逻辑
        // 用 DEFAULT_INITIAL_CAPACITY（16）作为数组的初始化大小
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    // ...
}

2. hash算法

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

h >>> 16 把h右移16位，因为int是32位的，所以上面的运算就等价于将 h 的高16位和低16位进行异或，目的是为了降低hash冲突的概率。因为HashMap中数组的长度总是2的倍数，所以对 hash() 结果进行取模时，其实就是取低位的多少位（比如如果数组的长度是16，那么就是取 hash() 结果的后4位，这种情况下，高位的二进制码就被屏蔽了，因此在计算槽位的时候会丢失高位的特征。现在的算法将低位和高位进行异或操作，可以把高位和低位的二进制码混合到低位，从而有助于降低hash冲突。

3. 扩容

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;					// Map中有多少kjey-value对，如果这个数量达到了指定大小 * 负载因子

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
int threshold;

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;

/**
 * 扩容方法
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // newCap = oldCap * 2
            // newThr = oldThr * 2
            newThr = oldThr << 1;
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];				// 新数组的大小是老数组的2倍
    table = newTab;													// table 等于扩容后的数组
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {							// 遍历老数组中的元素
                oldTab[j] = null;
                if (e.next == null)
                    // 这个分支说明当前链表中只有一个元素（头部元素）
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // 红黑树
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    // loHead 指向的是 j 这个索引后接的链表的头节点，loTail 指向的是尾节点
                    Node<K,V> loHead = null, loTail = null;
                    // hiHead 指向的是 j + oldCap 这个索引后接的链表的头节点，hiTail 指向的是尾节点
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            // 说明扩容后e这个节点仍然在 j 索引后接的链表上
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            // 说明扩容后e这个节点在 j + oldCap 这个索引后接的链表上
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

size 代表了当前HashMap 中有多少键值对。如果 size 达到了 threshold 就会进行数组的扩容，threshold = capacity * loadFactor。loadFactor 的默认大小是0.75，而capacity则是 table.length。在HashMap的构造函数中，可以自己指定 loadFactor，但是一般情况下使用默认的就可以，但是数组的大小一般会自己指定下，避免频繁地扩容。

数组每次扩容后的长度都是2的指数（数组的初始长度时16，每次扩容都 * 2）。每次扩容后，链表（红黑树）中的节点都需要进行rehash。在JDK1.7中，会用hash对新数组的长度取模；但是在JDK1.8中，为了优化性能，也使用了上面提到的hash算法。因为JDK1.8中，数组扩容后的长度都是2的倍数，所以可以使用 hash & (n - 1) 来计算新的索引位置。更进一步地简化，比如原数组的长度是16（对应二进制为 10000），那么新数组的长度是32（对应二进制为 100000），那么 hash & (n - 1) 其实就等于 hash & (1111) + hash & (10000) 也就是 oldIndex + hash & oldCap，所以只需要通过 (hash & oldCap) == 0 即可知道扩容后新的Index的位置。

4. put

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        // 初始的时候，table == null，所以需要先执行一个扩容，分配一个默认大小的数组，数组大小是16
        n = (tab = resize()).length;
    
    // 高性能的hash运算，n表示数组长度（初次是16），那么 (n - 1) & hash 就是取hash的低几位
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 如果算出来的index位置为空，则 key-value 是这个index后接的链表的头节点
    	// 也就是说此时没有产生hash冲突
        tab[i] = newNode(hash, key, value, null);
    else {
        // 此时产生了hash冲突
        // e指向的就是和参数key相同key的节点，也就是要覆盖的节点，e == null说明是新增；e != null 说明是覆盖
        Node<K,V> e; K k;
        // 此时p就是链表的头部节点
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            // 如果p节点的hash值和key一样，且p节点的key和参数key相等
            // 这个分支代表的是相同key的value覆盖
            e = p;
        else if (p instanceof TreeNode)
            // 这个分支代表这个index后接的是一棵红黑树
            // 将 (key, value) 插入红黑树中
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 这个分支代表这个index后接的是链表
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    // 这个分支代表已经到了链表的尾节点（null）
                    // 此时创建一个新节点，挂在链表的尾巴上
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        // binCount 代表的是链表的长度 - 1
                        // 如果binCount >= 7（也就是链表的长度大于等于8，也就是说在新增节点前，链表中已经有8个节点了），就将链表转变成红黑树
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    // 这个分支代表链表中存在节点的key等于参数key，也就是此时要覆盖
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            // 如果 e 不为空，说明这个是覆盖，而不是新增
            // 用 value 覆盖掉 e 中的旧的value，并且返回旧的value
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);					// 后续LinkedHashMap等子类会用到
            return oldValue;
        }
    }
    // 此时 e 为null，说明之前执行的是新增节点
    ++modCount;
    if (++size > threshold)
        // 新增节点后，size需要+1，如果此时的新size > threshold，则需要执行扩容
        resize();
    afterNodeInsertion(evict);
    return null;
}

因为数组的长度都是2的n次方，因此就可以保证 (n - 1) & hash 的结果与 hash % table.length 的结果始终相等。

从上面的代码可知，put 方法只有在插入数据（而不是更新数据）时才会判断是否需要将链表转换为红黑树。

5. 链表转红黑树

/**
 * Replaces all linked nodes in bin at index for given hash unless
 * table is too small, in which case resizes instead.
 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        TreeNode<K,V> hd = null, tl = null;
        do {
            // 遍历链表中的每一个节点，将这个节点转换成对应的 TreeNode
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                p.prev = tl;
                tl.next = p;
            }
            // tl 指向的是 p 节点的前一个节点，p指向的是当前节点
            tl = p;
        } while ((e = e.next) != null);
        // 遍历结束后，hd仍然是一个链表，但是一个 TreeNode 组成的双向链表
        // 就相当于把链表中的每一个节点都转换成了TreeNode，但是还没有把链表转成红黑树
        if ((tab[index] = hd) != null)
            // 将TreeNode双向链表转成红黑树
            hd.treeify(tab);
    }
}

阿瓦达啃小瓜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap 源码解析

1. HashMap的初始化static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16HashMap 中的 Node 数组的初始大小。实例化一个 HashMap 对象后，Node数组为null，只有在第一次往 HashMap 中put数据时，Node数组才会初始化，初始化的大小就是 DEFAULT_INITIAL_CAPACITY。transient Node<K,V>[] table; // Node
复制链接

扫一扫

专栏目录