HashMap源码解析一

hygge999

于 2022-11-23 16:51:43 发布

阅读量432

点赞数

分类专栏： # 集合 java 文章标签： java 面试数据结构

本文链接：https://blog.csdn.net/z55947810/article/details/128003309

版权

java 同时被 2 个专栏收录

10 篇文章 1 订阅

订阅专栏

集合

5 篇文章 0 订阅

订阅专栏

概述

HashMap 的源码，基本上面试都被问烂了，我记得我大三暑假开始找实习，过半公司都会问到这玩意。

一般问到 HashMap，我们可能会说些什么呢：

java1.7和1.8的 HashMap 数据结构，1.7是数据+链表，1.8多了一个红黑树。
默认的加载因子是 0.75。
当链表长度大于等于8，数组长度大于等于64时，链表会树化；当树的元素小于等于6的时候，又会退变成链表。
HashMap 的容量都是 2 的整数倍。
链表的插入方法， 1.7的时候是头插法，1.8变成了尾插法。
可能还会讲一下1.8的扩容机制。

下面我们试着解析一下 Java8 的 HashMap 源码，探索一下 HashMap 的源码逻辑。

个人感觉，Java8 的源码很精简，但是可读性确实就不那么友好了。

一些静态参数

DEFAULT_INITIAL_CAPACITY

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认初始容量为16。

MAXIMUM_CAPACITY

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

最大容量是 Integer.MAX_VALUE 即 2^31-1 ，但由于 HashMap 规定容量是2的正整数幂，所以在他MAXIMUM_CAPACITY 设置成了 2^30，但是当他容量达到 MAXIMUM_CAPACITY 后再次扩容的话，就会扩容到 Integer.MAX_VALUE，后面的自动扩容会给出代码分析。

DEFAULT_LOAD_FACTOR

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

默认负载系数为0.75。

在你的 HashMap 还没达到最大容量前，负载系数 * 容量 是当前 HashMap 的实际最大储存上限，例如一个容量为16的 HashMap，当他列表长度为12（16 * 0.75 = 12）时，就会触发扩容。

负载系数的作用是缓解hash冲突，试想一下如果没有使用负载系数时，如果你有一个4096大小的HashMap，你插入4095个键值对时，很有可能你的hash冲突会非常大，你的链表结构和树型结构也很庞大，然后接着插入第4096个键值对后，会付出巨大的代价进行庞大的扩容操作。

所以就有了负载系数这个东西，他能让你链表结构和树型结构不那么大，让你的数值分布的更均匀，极大的缓解了hash冲突，也让扩容时不需要花那么大的代价。

TREEIFY_THRESHOLD & UNTREEIFY_THRESHOLD

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;

链表树化以及 取消树化 的阈值。

树化的前提不止这一条，不是说链表长度到了8就可以树化了，只是链表长度到了8就会调用树化的方法 treeifyBin() ，该方法里面还会去判断是否树化。

MIN_TREEIFY_CAPACITY

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

最小树化容量，指的是数组的长度要求最少为64。

所以树化的前提有两个：

数组长度达到64
链表长度达到8

如果一个数组没有达到64，但是他有一个节点的链表长度达到了8，这时候他会进行扩容操作。

数据节点

Node

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Java7 中使用 Entry 来代表每个 HashMap 中的数据节点，Java8 中使用 Node，基本没有区别，都是 key，value，hash 和 next 这四个属性。

不过，Node 只能用于链表的情况，红黑树的情况需要使用 TreeNode。

TreeNode

/**
 * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
 * extends Node) so can be used as extension of either regular or
 * linked node.
 */
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
    TreeNode<K,V> parent;  // red-black tree links
    TreeNode<K,V> left;
    TreeNode<K,V> right;
    TreeNode<K,V> prev;    // needed to unlink next upon deletion
    boolean red;
    TreeNode(int hash, K key, V val, Node<K,V> next) {
        super(hash, key, val, next);
    }

    ----------------
}

这里的方法太多了，就不都展示了。

构造函数

构造函数一

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

无参构造函数，将负载系数设置为默认的0.75。

至于注解说的默认容量设置为16，是在第一次put后，进行扩容操作那里设置的。以及空的 HashMap 其实指的是 table 变量。

构造函数二

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

只传一个参数时，参数为初始容量的值。负载系数依旧设置为默认的0.75。

下面的this方法就是调用的我们第三个构造函数。

构造函数三

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

传两个参数时，第一个值为初始容量，第二个值为负载系数。

其中初始容量不能小于0，负载系数不能小于等于0或者为空，否则报错。

如果一开始的初始容量大于了 MAXIMUM_CAPACITY ，也会把他强行置为 MAXIMUM_CAPACITY。

PS：这里有个骚操作，就是 loadFactor 可以大于1，虽然估计没人这样写，但是这样也是没有错的，当他大于1的时候，必定会有链表结构产生。

然后下面有个 tableSizeFor() 方法。

tableSizeFor()

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

我们上面说了 HashMap 的容量都是2的正整数幂，但是我们实际初始化的时候，输入的初始容量可能为各种各样的正整数值，并且都没有报错，其实就是这个方法给我的初始容量做了处理。

上面的方法可能有点魔幻，第一次看的话可能很迷糊，咋们来研究一下这是个啥原理。

首先一个 2的正整数幂 以及 2的正整数幂-1 ，这两种数有什么特点呢：

// 2的正整数幂，肯定是两边全是0，中间就一个1
000...0001000...000
    
// 2的正整数幂-1，肯定是高位是连续的0，低位是连续的1
000...000111...111

两者相比较，后者肯定更容易操作，我们举个例子：

现有一个十进制数：17（2^4+1），如果要用这个数字当做初始容量，那么我们这个方法肯定会把他变成32（2^5）。那么怎么将17变成32呢。

cap = 17     // 二进制：0001 0001
n = cap - 1  // 二进制：0001 0000
需要：32      // 二进制：0010 0000
目标：32 - 1  // 二进制：0001 1111

你会发现，如果将 0001 0000 -> 0001 1111 会比 0001 0001 -> 0010 0000 容易的多。
这时候再联系上面的 tableSizeFor() 方法一起看。

int n = cap - 1;  // 二进制：0001 0000
n |= n >>> 1;     // 二进制：0001 1000
n |= n >>> 2;     // 二进制：0001 1110
n |= n >>> 4;     // 二进制：0001 1111，得到目标值
........

n |= n >>> 1 能保证第一个1后面也接一个1，这时候肯定至少会有两个1，并且连续。

n |= n >>> 2 能保证在上一步操作后，得到的两个1的后面两个数也都是1，这时候有至少会有4个1，并且连续。

以此类推，n |= n >>> 16 能保证，肯定能得到一个 000…000111…111类型的数，即2的正整数幂-1。然后返回值+1就可以得到一个2的正整数幂数。

这里为什么要一直到16为止，因为 n 为 int 类型，int 类型为4字节，32位数。

构造函数四

/**
 * Constructs a new <tt>HashMap</tt> with the same mappings as the
 * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
 * default load factor (0.75) and an initial capacity sufficient to
 * hold the mappings in the specified <tt>Map</tt>.
 *
 * @param   m the map whose mappings are to be placed in this map
 * @throws  NullPointerException if the specified map is null
 */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

这个构造函数是将一个 Map 类型的变量作为传参。

将负载系数设置为默认值，然后调用 putMapEntries() 方法处理。

putMapEntries()

/**
 * Implements Map.putAll and Map constructor
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > 0) {
        if (table == null) { // pre-size
            float ft = ((float)s / loadFactor) + 1.0F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

这个方法比较简单，就不细说了。

但是里面有一个：float ft = ((float)s / loadFactor) + 1.0F;

阿里巴巴规范有一条是建议初始化 HashMap 时输入初始容量，减少扩容需要的代价。

这个初始容量就就可以用源码的这句代码写：

((float) s / loadFactor) + 1.0F;
// 或者
(int) Math.ceil(s * 1.0 / loadFactor);

当然，实际使用的时候，如果无法一开始就确定大概能存储多少个数，其实很难判断给定一个多大的初始值比较适合。

hygge999

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录