HashMap 实现原理及源码分析

最新推荐文章于 2023-06-15 19:29:28 发布

执于代码

最新推荐文章于 2023-06-15 19:29:28 发布

阅读量156

点赞数

分类专栏：【Java语言】

本文链接：https://blog.csdn.net/xiamaocheng/article/details/104377234

版权

【Java语言】专栏收录该内容

65 篇文章 0 订阅

订阅专栏

HashMap是JDK中非常重要的容器，采用 数组 + 链表 的方式实现，理想情况下能支持 O(1) 时间复杂度的增删改查操作。本文将由浅入深地讲解哈希表的实现原理，并对HashMap的部分源码进行分析。

1. 从数组说起
数组应该是我们最先学习的数据结构，它是内存中一块连续的存储单元，因此计算机可以根据数组起始地址、元素长度和下标，计算出我们要访问的元素的地址，时间复杂度为 O(1) 。

以下代码定义了一个简单的 Student 类，假如我们要存储 20 个 Student 对象，我们希望能够在 O(1) 时间复杂度内，根据 studentID 找到相应的对象。

public class Student {
public int studentID;
public String name;
public Student(int studentID, String name) {
this.studentID = studentID;
this.name = name;
}
}
1
2
3
4
5
6
7
8
如果我们要存储的 20 个 Student 对象的 studentID 刚好就是从 0 到 19，我们自然可以新建一个长度为 20 的 Student 数组 students，然后将对象的 studentID 作为数组下标，放到对应的 slot 里面，如下图所示。这样的话，如果我们想找 studentID 为 15 的对象，我们就可以直接访问 students[15]。

Student[] students = new Student[20];
Student stu0 = new Student(0, "stu0");
Student stu19 = new Student(19, "stu19");

students[stu0.studentID] = stu0;
students[stu19.studentID] = stu19;

为了表述方便，我们用 key 表示查找关键字，在这里指的 studentID，用 value 表示查找内容，这里指的 Student 对象，用 slot 表示数组的每一个元素，slot 由数组下标 index 来唯一标识（slot 的意思是槽，数组的元素就像是一个槽一样，等着被 Student 对象填满）。下图展示了 Student 对象在数组中的存储状态。

2. 哈希函数和哈希碰撞
那如何既能利用数组的常数查找特性，又能避免空间浪费呢？我们可以很自然地想到，建立一个将 studentID 映射到 0~19 的函数，比如 h(studentID) = studentID % 20。这个函数就叫做哈希函数（或者散列函数），以此为例，我们可以将 studentID 分别为 21，140，1163 的 Student 对象存储到数组上，如下图。

Student stu21 = new Student(21, "stu21");
Student stu140 = new Student(140, "stu140");
Student stu1163 = new Student(1163, "stu1163");

students[stu21.studentID % 20] = stu21;
students[stu140.studentID % 20] = stu140;
students[stu1163.studentID % 20] = stu1163;

3. HashMap 中的常量
HashMap 中的一些比较重要的常量如下。

static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
static final float DEFAULT_LOAD_FACTOR = 0.75f;
static final int TREEIFY_THRESHOLD = 8;
1
2
3
根据英文名称我们可以大致了解该常量的作用，DEFAULT_INITIAL_CAPACITY 是默认初始容量，1 << 4 表示 1 左移四位，也就是 16。大家已经明白，HashMap 是以数组+链表的方式实现的，这里容量指的就是实例化 HashMap 对象内部数组的长度。如果我们调用哈希表的构造函数时，未指定初始容量，数组的长度就由这个默认初始容量确定。

DEFAULT_LOAD_FACTOR 默认装载因子，装载因子的含义是平均每个 slot 上悬挂了多少个结点，可以由下式计算得到

装载因子 = 结点数量 / 数组长度

同样的，如果调用哈希表的构造函数时，未指定装载因子，就使用这个装载因子。

4. 静态内部类 Node<K, V>
node 就是结点的意思，本文中所说的 “结点”，指的就是一个 Node 对象。Node<K, V> 实现了 Entry<K, V> 接口。

static class Node<K,V> implements Map.Entry<K,V>
1
Node<V, U> 中有 4 个成员变量。可以看出 Node 的主要功能是把 key 和与之对应的 value 封装到一个结点中，该结点的 next 字段指向下一个结点，从而实现单向链表。hash 由 key 的哈希值得来，下文会介绍。
————————————————

5. HashMap 构造函数和扩容
以下是 HashMap 的成员变量，table 是 Node 数组，HashMap 就用它来存放结点。size 表示目前 HashMap 中存放的结点总数。threshold 是阈值，表示当前的数组容量所能容纳的结点数，它是装载因子和数组容量的乘积，当 size 大于 threshold 的时候，就需要进行扩容操作。loadFactor 即装载因子。
transient Node<K,V>[] table;
transient int size;
int threshold;
final float loadFactor;

构造函数的主要任务就是初始化其中的一些成员变量，因为我们调用的是无参构造函数，所以只有装载因子被赋值了。注意这个时候并没有初始化 table 数组。

public HashMap(int initialCapacity, float loadFactor) {
   // 省略了一些判断极端输入的代码
   this.loadFactor = loadFactor;
   this.threshold = tableSizeFor(initialCapacity);
}
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

因为位运算的执行效率很高，所以在 HashMap 中有很多地方都有应用，最大化地提高了执行速度。-1 的十六进制表示是 0x1111，numberOfLeadingZeros 返回 cap - 1 前面 0 的个数，>>> 是无符号右移运算。以 cap= 16 为例，cap -1 = 0x000F，于是 n = 0x000F，返回 n + 1，也就是 16,。

扩容操作由 resize() 方法完成，因为代码要综合考虑各种情况，所以有很多 if-else 语句，但是这些并不是我们要理解的重点。我们需要知道的是，一般情况下， resize() 主要完成的任务是构造一个新的数组，数组的长度为原数组长度的 2 倍，然后将原数组的节点复制到新数组，最后返回新数组。

复制过程由 for 循环来完成，其中 e instanceof TreeNode 是用来判断结点 e 是不是已经被树形化为红黑树结点。

因为数组容量始终是 2 的幂，所以原数组中某个 index 对应的 slot 悬挂的链表上的结点，只可能出现在新数组的两个 slot 中：index 和 index + oldCap。oldCap 表示原数组的长度。相应的，loHead 表示 index 对应的 slot 悬挂的链表头部，hiHead 表示 index + oldCap 对应的 slot 悬挂的链表尾部。

在判断 e 应该放到哪条链表的尾部时，也采用了比较讨巧的办法，e.hash & oldCap 如果为 0 就放到 loTail，如果为 1 就放到 hiTail。

6. put() 和 get() 方法

以下面的代码为例，分析 put() 方法的执行过程。

Student stu21 = new Student(21, "stu21");
HashMap<Integer, Student> map = new HashMap<>();
map.put(stu21.studentID, stu21);
stu21.studentID 是 int 类型，在执行 put() 方法之前，需要进行装箱，把它转换为 Integer 类型，这一过程由编译器自动完成

put() 方法中调用了 putVal() 方法。如下所示，其中的 hash() 方法是一个静态方法。返回的是 key 的哈希值无符号右移 16 位，然后跟自身异或的结果。其目的是为了利用哈希值前 16 位的信息。

static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
下面是 putVal() 方法的代码，HashMap 不允许内部有重复的 key 存在，所以当 put() 方法的参数 key 与已有节点的 key 重复时，默认会将原来的 value 覆盖。onlyIfAbsent 为 true 表示只有在原来的 value 为 null 的时候才进行覆盖，此处传入的是 false，所以新的 value 一定会把原有的 value 覆盖。

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}

总结：

HashMap 以 “数组 + 链表” 的方式实现，其内部以 Node 对象的方式存储 key-value 键值对。数组的长度始终保持为 2 的幂，方便使用位运算提高执行速度。key 的哈希值随机的条件下，其增删改查操作的时间复杂度正比于它的负载因子（loadFactor）。当某条链表的结点数大于 8 的时候，该链表被转化为一棵红黑树。当结点总数大于 threshold 的时候，进行扩容操作，新数组的长度是原数组的两倍。

参考博文：

https://blog.csdn.net/weixin_42466155/article/details/104310706?request_id=&utm_source=distribute.pc_category.none-task

执于代码

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
HashMap 实现原理及源码分析

HashMap是JDK中非常重要的容器，采用数组 + 链表的方式实现，理想情况下能支持 O(1) 时间复杂度的增删改查操作。本文将由浅入深地讲解哈希表的实现原理，并对HashMap的部分源码进行分析。1. 从数组说起数组应该是我们最先学习的数据结构，它是内存中一块连续的存储单元，因此计算机可以根据数组起始地址、元素长度和下标，计算出我们要访问的元素的地址，时间复杂度为 O(1) 。以...
复制链接

扫一扫