基于JDK 8的HashMap源码解析

感谢Neal Gafter、Arthur van Hoff、Josh Bloch、Doug Lea为我们提供了HashMap这一工具类,方便了诸多Java开发者的开发工作。

看再多的源码解析,自己不亲身去阅读源码,都是徒劳无功。

注释分析

为了我们能轻松地阅读源码,JDK的开发者们已经为我们准备了许多详尽的注释,阅读这些注释对我们理解代码起着事半功倍的效果,那么我们就先来看看HashMap在JDK8中的注释吧!

Hash table based implementation of the Map interface.  This implementation provides all of the optional map
operations, and permits null values and the null key.  (The HashMap class is roughly equivalent to Hashtable, 
except that it is unsynchronized and permits nulls.)  This class makes no guarantees as to the order of the 
map; in particular, it does not guarantee that the order will remain constant over time.

挑上述这段注释的几个重点:
(1)HashMap里允许存放null键null值,HashTable不允许存放null键和null值。
(2)HashMap不是线程安全的容器,HashTable是线程安全的容器。
(3)HashMap不保证存放的键值对的顺序,甚至该顺序随着HashMap中键值对的增加/删除可能会发生变化。

This implementation provides constant-time performance for the basic operations (get and put), assuming the 
hash function disperses the elements properly among the buckets.  Iteration over collection views requires 
time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number
of key-value mappings).  Thus, it's very important not to set the initial capacity too high (or the load factor too 
low) if iteration performance is important.

挑上述这段注释的几个重点:
(1)HashMap利用哈希函数把键值对元素合理地分布在各个哈希桶中,保证了get和put操作的时间复杂度是O(1)
(2)迭代操作取决于哈希桶的数量和HashMap中键值对的数量。因此,为了良好的迭代性能,不要把哈希桶的初始容量设得太高,也不要把HashMap的负载率设得太低。

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.  The 
capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time 
the hash table is created.  The load factor is a measure of how full the hash table is allowed to get before its 
capacity is automatically increased.  When the number of entries in the hash table exceeds the product of the 
load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so 
that the hash table has approximately twice the number of buckets.

挑上述这段注释的几个重点:
(1)初始容量(初始哈希桶数量)和负载率是影响HashMap性能的两个重要参数。
(2)当HashMap中的键值对数量超过负载率和当前哈希桶数量的乘积时,HashMap会发生扩容操作(哈希桶数量翻倍)。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs.  Higher 
values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the 
HashMap class, including get and put).  The expected number of entries in the map and its load factor should 
be taken into account when setting its initial capacity, so as to minimize the number of rehash operations.  If 
the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash 
operations will ever occur.

挑上述这段注释的几个重点:
(1)将默认负载率设为0.75时间性能和空间性能上的权衡结果,过高的负载率虽然可以提高空间性能,但会降低时间性能。
(2)当设置初始哈希桶数量时,需要考虑HashMap中准备存放的键值对数目和HashMap的负载率,以尽量减少扩容操作的发生频率。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will 
allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to 
grow the table.  Note that using many keys with the same hashCode() is a sure way to slow down 
performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use 
comparison order among keys to help break ties.

挑上述这段注释的几个重点:
(1)当太多键的hashCode()值相同时,会影响HashMap的性能。
(2)当出现太多键有相同的hashCode()值的情况时,如果此时键实现了Comparable接口,那么HashMap会利用这点做一些措施,来提高性能。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at 
least one of the threads modifies the map structurally, it must be synchronized externally.  (A structural 
modification is any operation that adds or deletes one or more mappings; merely changing the value 
associated with a key that an instance already contains is not a structural modification.)  This is typically 
accomplished by synchronizing on some object that naturally encapsulates the map.

挑上述这段注释的一个重点:
当多线程同时访问HashMap,且至少有一个线程对HashMap做出了结构性的修改(如新增/删除键值对),必须在外部加同步措施。

If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method.  This 
is best done at creation time, to prevent accidental unsynchronized access to the map:
Map m = Collections.synchronizedMap(new HashMap(...));

挑上述这段注释的一个重点:
可以利用Collections.synchronizedMap()方法来将HashMap包装成为线程安全的容器。

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally 
modified at any time after the iterator is created, in any way except through the iterator's own remove method,  
the iterator will throw a ConcurrentModificationException.  Thus, in the face of concurrent modification, the 
iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined 
time in the future.

挑上述这段注释的一个重点:
Fail-Fast机制:当用迭代器遍历HashMap时,如果该HashMap发生了结构性的修改,会抛出ConcurrentModificationException异常。

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to 
make any hard guarantees in the presence of unsynchronized concurrent modification.  Fail-fast iterators 
throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a 
program that depended on this exception for its correctness: the fail-fast behavior of iterators should be 
used only to detect bugs.

挑上述这段注释的一个重点:
Fail-Fast机制抛出ConcurrentModificationException异常不是一种可靠的机制,不要让你的程序的运行依赖于该机制。

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed 
into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. Most methods try to use 
normal bins, but relay to TreeNode methods when applicable (simply by checking instanceof a node).  Bins of 
TreeNodes may be traversed and used like any others, but additionally support faster lookup when 
overpopulated. However, since the vast majority of bins in normal use are not overpopulated, checking for 
existence of tree bins may be delayed in the course of table methods.

挑上述这段注释的一个重点:
当某个哈希桶中的键值对过多时,HashMap会将该桶中原来的链式存储的节点转化为红黑树的树节点。

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of 
ties, if two elements are of the same "class C implements Comparable<C>", type then their compareTo 
method is used for ordering. (We conservatively check generic types via reflection to validate this -- see 
method comparableClassFor).  The added complexity of tree bins is worthwhile in providing worst-case O(log 
n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully 
under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, 
as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of 
these apply, we may waste about a factor of two in time and space compared to taking no precautions. But 
the only known cases stem from poor user programming practices that are already so slow that this makes
 little difference.)

挑上述这段注释的几个重点:
(1)同一个哈希桶中的树节点根据hashCode值排序,但当hashCode值相同且键所属的类实现了Comparable接口,会根据Comparable接口定义的compareTo方法来排序
(2)HashMap中提供了comparableClassFor()方法来对泛型信息(即键所属的类)进行检查。
(3)在红黑树中查找节点的时间复杂度是O(logn)。

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough 
nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or 
resizing) they are converted back to plain bins.  In usages with well-distributed user hashCodes, tree bins are 
rarely used.  Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution 
(http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default 
resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, 
the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
0:    0.60653066
1:    0.30326533
2:    0.07581633
3:    0.01263606
4:    0.00157952
5:    0.00015795
6:    0.00001316
7:    0.00000094
8:    0.00000006
more: less than 1 in ten million

挑上述这段注释的几个重点:
(1)树节点所占的空间是普通节点的两倍,因此只有在同一个哈希桶中的节点数量大于等于TREEIFY_THRESHOLD时,才会将该桶中原来的链式存储的节点转化为红黑树的树节点。
(2)当一个哈希桶中的节点数量过少时,原来的红黑树树节点又会转化为链式存储的普通节点。
(3)理论上,在随机生成hashCode的情况下,一个哈希桶中的节点数概率满足泊松分布,其概率如上所示,其中我们可以看到一个桶中有8个节点(转化为红黑树的树节点)的概率只有0.00000006(亿分之六)。

The root of a tree bin is normally its first node.  However, sometimes (currently only upon Iterator.remove), 
the root might be elsewhere, but can be recovered following parent links (method TreeNode.root()).

挑上述这段注释的一个重点:
一般来说,树节点的根节点是该哈希桶中的第一个节点(当调用Iterator.remove()方法时不满足该条件),但可通过TreeNode.root()方法恢复。

All applicable internal methods accept a hash code as an argument (as normally supplied from a public 
method), allowing them to call each other without recomputing user hashCodes. Most internal methods also 
accept a "tab" argument, that is normally the current table, but may be a new or old one when resizing or 
converting.
When bin lists are treeified, split, or untreeified, we keep them in the same relative access/traversal order (i.e., field Node.next) to better preserve locality, and to slightly simplify handling of splits and traversals that invoke iterator.remove. When using comparators on insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers.

挑上述这段注释的一个重点:
当节点发生树化、分摊化、去树化等操作时,我们保持节点在利用Node.next方式遍历时的相对顺序不变。

The use and transitions among plain vs tree modes is complicated by the existence of subclass 
LinkedHashMap. See below for hook methods defined to be invoked upon insertion, removal and access that 
allow LinkedHashMap internals to otherwise remain independent of these mechanics. (This also requires that 
a map instance be passed to some utility methods that may create new nodes.)
The concurrent-programming-like SSA-based coding style helps avoid aliasing errors amid all of the twisty pointer operations.
注释分析总结

1 线程安全方面
(1)HashMap是线程不安全的
(2)当多线程同时访问HashMap,且至少有一个线程对HashMap做出了结构性的修改(如新增/删除键值对),必须在外部加同步措施
(3)可以利用Collections.synchronizedMap()方法来将HashMap包装成为线程安全的容器
2 键值特性方面
(1)HashMap允许存放null键和null值
(2)HashMap不保证存放的键值对的顺序,甚至该顺序随着HashMap中键值对的增加/删除可能会发生变化
(3)HashMap利用哈希函数把键值对元素合理地分布在各个哈希桶中,保证了get和put操作的时间复杂度是O(1)
(4)当太多键的hashCode()值相同时,会影响HashMap的性能
(5)当出现太多键有相同的hashCode()值的情况时,如果此时键实现了Comparable接口,那么HashMap会利用这点做一些措施,来提高性能
3 初始容量和负载率方面
(1)初始哈希桶数量和负载率是影响HashMap性能的两个重要参数
(2)迭代操作取决于哈希桶的数量和HashMap中键值对的数量。因此,为了良好的迭代性能,不要把哈希桶的初始容量设得太高,也不要把HashMap的负载率设得太低
(3)当HashMap中的键值对数量超过负载率和当前哈希桶数量的乘积时,HashMap会发生扩容操作(哈希桶数量翻倍)
(4)将默认负载率设为0.75是时间性能和空间性能上的权衡结果,过高的负载率虽然可以提高空间性能,但会降低时间性能
(5)当设置初始哈希桶数量时,需要考虑HashMap中准备存放的键值对数目和负载率,以尽量减少扩容操作的发生频率
4 Fail-Fast机制方面
(1)当用迭代器遍历HashMap时,如果该HashMap发生了结构性的修改,会抛出ConcurrentModificationException异常
(2)Fail-Fast抛出ConcurrentModificationException异常不是一种可靠的机制,不要让你的程序的运行依赖于该机制
5 节点转化方面
(1)当某个哈希桶中的键值对过多时,HashMap会将该桶中原来的链式存储的节点转化为红黑树的树节点
(2)当一个哈希桶中的节点数量过少时,原来的红黑树树节点又会转化为链式存储的普通节点
(3)同一个哈希桶中的树节点根据hashCode值排序,但当hashCode值相同且键所属的类实现了Comparable接口,会根据Comparable接口定义的compareTo方法来排序
(4)树节点所占的空间是普通节点的两倍,因此只有在同一个哈希桶中的节点数量大于等于TREEIFY_THRESHOLD时,才会将该桶中原来的链式存储的节点转化为红黑树的树节点
(5)理论上,在随机生成hashCode的情况下,一个哈希桶中的节点数概率满足泊松分布,其概率如上所示,其中我们可以看到一个桶中有8个节点(转化为红黑树的树节点)的概率只有0.00000006(亿分之六)
(6)一般来说,树节点的根节点是该哈希桶中的第一个节点(当调用Iterator.remove()方法时不满足该条件),但可通过TreeNode.root()方法恢复
(7)当节点发生树化、均摊、去树化等操作时,我们保持节点在利用Node.next方式遍历时的相对顺序不变

HashMap继承的类和实现的接口

看完了大段注释之后,想必大家对HashMap的许多特性都有了初步的印象,这么多的特性是怎么实现的呢?一起来看看代码吧!先从HashMap所继承的类和实现的接口看起。
在这里插入图片描述
如上图所示,HashMap继承了AbstractMap类,实现了接口Map、Serializable和Cloneable。
这里有一个疑点,从上图我们也可以看出:HashMap直接实现Map接口,而又间接地通过继承AbstractMap类实现了Map接口,实现了两次Map接口,这是为什么呢?
主要有两种说法:
(1)添加Map接口声明是为了Class类的getInterfaces方法能够直接获取到Map接口。
Class类的getInterfaces()方法不能获取到父类实现的接口,如果不写上实现Map接口,这个方法返回的数组中就没有Map.class。
(2)这就是一个写法上的错误,并没有深意。

HashMap中的常量
//序列化与反序列化用
private static final long serialVersionUID = 362498820763181265L;
//初始哈希桶数量,必须是2的幂,通过移位运算提高运算效率
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
//最大哈希桶数量,必须是2的幂,通过移位运算提高效率
static final int MAXIMUM_CAPACITY = 1 << 30;
//初始负载率
static final float DEFAULT_LOAD_FACTOR = 0.75f;
//当某个哈希桶中节点数量大于等于该值时,会转化为树节点
static final int TREEIFY_THRESHOLD = 8;
//当某个哈希桶中节点数量小于等于该值时,会转化为普通节点
static final int UNTREEIFY_THRESHOLD = 6;
//只有当哈希桶的数量大于等于64时,某个哈希桶中的节点才可能因为节点过多而转化为树节点
//treeifyBin函数中会用到该变量,后续会提及
static final int MIN_TREEIFY_CAPACITY = 64;

这里有几个关键的问题需要解释一下:
为什么哈希桶的数量必须是2的幂?
(1)设置为2的幂我们可以通过移位运算提高运算效率。
(2)在介绍putVal()函数时会解释这一点。
某个哈希桶中的普通节点转化为树节点的界限为什么是8?树节点转化为普通节点界限为什么是6?
(1)之前讲的“泊松分布”,树节点的开销比普通节点要大,而根据泊松分布,大于等于8的概率仅有亿分之六,因此选择8作为普通节点转化为树节点的界限。
(2)普通节点转化为树节点的界限是8而树节点转化为普通节点的界限是6,中间差了2,主要是为了防抖动。如果普通节点转化为树节点的界限和树节点转化为普通节点的界限相同,那么一旦插入一个节点达到了这个界限,就需要从普通节点转化为树节点;删除这个新插入的节点后,又要从树节点转化为普通节点,这样频繁的抖动的开销是很大的。
为什么只有当哈希桶的数量大于等于64时,某个哈希桶中的节点才可能因为节点过多而转化为树节点?
按JDK中的说法,树化的桶数量阈值至少是4倍的TREEIFY_THRESHOLD,因此至少因是32,否则在树化操作和扩容操作间会产生冲突。

HashMap中的节点类型
Node
static class Node<K,V> implements Map.Entry<K,V> {
   
		//hash值定义为final,不可变,在构造函数中初始化该值
        final int hash;
        //key值定义为final,不可变的引用,但引用的对象可变,在构造函数中初始化该值
        final K key;
        //value值是可变的
        V value;
        //next指针,可变
        Node<K,V> next;

		//构造函数
        Node(int hash, K key, V value, Node<K,V> next) {
   
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }

        public final K getKey()        {
    return key; }
        public final V getValue()      {
    return value; }
        public final String toString() {
    return key + "=" + value; }

		//该方法是不可被覆盖的
        public final int hashCode() {
   
        	//将key的hashCode值和value的HashCode值作异或运算
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }

        public final V setValue(V newValue) {
   
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
   
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
   
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

划重点:
(1)Node节点实现了Map.Entry接口,有4个属性,hash值、key值、value值、next指针
(2)Node节点的hash值是不可变的。
(3)Node节点的key引用是不可变的,但key指向的对象是可变的
(4)Node节点的value值、next指针都是可变的。

TreeNode
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
   
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        //前一个节点
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;	//说明这是一颗红黑树
        TreeNode(int hash, K key, V val, Node<K,V> next) {
   
            super(hash, key, val, next);
        }

		//返回红黑树的根节点
        final TreeNode<K,V> root() {
   
            for (TreeNode<K,V> r = this, p;;) {
   
                if ((p = r.parent) == null)
                    return r;
                r = p;
            }
        }

		//调整红黑树的结构,使指定的节点root成为对应哈希桶中的的第一个节点
        static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
   
            int n;
            if (root != null && tab != null && (n = tab.length) > 0) {
   
                int index = (n - 1) & root.hash;	//找到桶位
                TreeNode<K,V> first = (TreeNode<K,V>)tab[index];
                if (root != first) {
   	//当根节点不是桶中第一个元素时
                    Node<K,V> rn;
                    tab[index] = root;	//根节点放在桶的第一位
                    TreeNode<K,V> rp = root.prev;	//根的前一个节点
                    //将根节点从双链表中抽出,原来的位置前后链接
                    if ((rn = root.next) != null)	
                        ((TreeNode<K,V>)rn).prev = rp;
                    if (rp != null)
                        rp.next = rn;
                    //将根节点放在双链表的首位
                    if (first != null)
                        first.prev = root;
                    root.next = first;
                    root.prev = null;
                }
                assert checkInvariants(root);
            }
        }
        
		//从this树节点查找hash值为h,Key为k的节点
    	final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
   
        	TreeNode<K,V> p = this;
        	do {
   
            	int ph, dir; K pk;
            	TreeNode<K,V> pl = p.left, pr = p.right, q; // 当前节点的左右孩子
            	if ((ph = p.hash) > h) // hash值小的从左子树迭代查找
                	p = pl;
            	else if (ph < h) // hash值大的从右子树迭代查找
                	p = pr;
            	// hash值相等,且键地址相同或都为空时,查找成功
            	else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                	return p;
            	// hash值相等,但键不相同,且节点没有左子树,就从右子树查找
            	else if (pl == null)
                	p = pr;
            	// hash值相等,但键不相同,且节点没有右子树,就从左子树查找
            	else if (pr == null)
                	p = pl;
            	// 比较两个Key
            	else if ((kc != null ||
                      	(kc = comparableClassFor(k)) != null) &&
                     	(dir = compareComparables(kc, k, pk)) != 0)
                	p = (dir < 0) ? pl : pr;
            	// Key不可比较或比较结果为0时,先在右子树中查找
            	else if ((q = pr.find(h, k, kc)) != null)
                	return q;
            	// 右子树查找不到时
            	else
                	p = pl;
        	} while (p != null);
        	return null;
    	}

    	// 从根节点查找hash值为h,Key为k的节点
    	final TreeNode<K,V> getTreeNode(int h, Object k) {
   
        	return ((parent != null) ? root() : this).find(h, k, null);
    	}

    	// 强行比较两个对象,结果为-1或1
    	static int tieBreakOrder(Object a, Object b) {
   
        	int d;
        	// a和b都不为空时比较它们的类名
        	if (a == null || b == null ||
            	(d = a.getClass().getName().
             	compareTo(b.getClass().getName())) == 0)
            	// a为null,或b为null,或类名也相等时,比较它们的内存地址
            	d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
                 	-1 : 1);
        	return d
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值