HashMap 1.8 相比于 1.7 及之前, 将集合的结构变为:数组+链表+红黑树。 引入红黑树自然是为了提高查询效率。这里摘录部分源码进行分析:
- 开头注释
- 几个类常量属性
- 构造方法
- put, hash, get, remove方法
- 为什么capacity为2的次幂
1. 开头注释部分
我觉得还是可以读一下的,这一部分大致总结了一下HashMap
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
*
这一段提出 HashMap 实现了 Map 接口,提供了所有可选择的map操作,
并且允许values(多个)值为null,允许存在一个为null的key,
其与Hashtable相似,差别于:
- HashMap可以存放null值,而Hashtable不行;
- 同时,HashMap非线程同步,即非线程安全的,而Hashtable是线程安全的。
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
- 当hash方法可以使使元素均匀分布在buckets(我认为指的是数组中的每一个位置)中时,HashMap的get和put操作,是常量时间的(时间复杂度O(1))。
- 对于HashMap集合的遍历,需要的时间和buckets数量(数组大小,capacity)以及 键值对数量 之和 成比例。
- 因此,如果对遍历的性能很看重的话,最好不要将初始的capacity设置的过大(或者将负载因子设置的过小)
- 负载因子用于决定何时需要对数组扩容,默认值为0.75
<p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
- 影响HashMap性能的两个实例变量分别为:initial capacity(初始容量) 以及 load factor(负载因子)
- 初始容量是指hash表创建时的容量。负载因子是指:当hash表的使用程度为多少时,对其容量进行扩充。
- 当entries(键值对)数量超过 load factor * capacity 时,对hash表进行扩容,也就是rehashed,将其扩容至二倍。
<p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
- load factor 设置为默认值0.75时对时间和空间消耗达到一个平衡。较大的load factor会减少空间消耗但是会增加查找消耗。
- 当设置初始capacity时需要考虑entries数量以及load factor,以最小化rehash操作的次数。
- 当 初始 capacity > entries数量/load factor,扩容就不会发生。
<p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
如果需要将很多键值对存进HashMap,建议创建一个capacity足够大的map,而不是让其通过rehash来扩容。
- 注意:使用很多具有相同hashCode值的keys必然会降低hash表的性能。当keys实现了Comparable接口时,可以使用keys之间的比较顺序来减小影响(这句不太明白)
/* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>*/
- HashMap是非线程同步的。若多个线程同时访问一个HashMap对象,且至少有一个线程修改了map结构(结构改变是指添加或删除一个或多个键值对,而不是仅仅改变了已经存在的key与value的关系),就必须要进行外部同步,一般通过对一些封装map的对象进行同步来实现(我认为就是使用一些线程安全之类的手段如Sychronized关键字之类的)
- 若没有这种对象存在,map应该使用Collections.sychronizedMap()方法进行包装。这一过程最好在创建时完成,以避免对map的突然非同步访问。使用示例:
Map m = Collections.synchronizedMap(new HashMap(…));
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
HashMap的迭代器是具有 fail-fast 特性的:若迭代器创建后的任意时刻,map结构被修改,除了迭代器自身的remove方法, 均会抛出ConcurrentModificationException异常。因此,在面临并发修改时,iterator会立马失败,以避免冒任何风险。
<p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
- 注:iterator的fail-fast 行为是不能被保证的,因为,通常,在发生非同步并发修改时做出任何硬性保障是不可能的。Fail-fast 迭代器只能尽最大的努力抛出ConcurrentModificationException异常。
- 因此,依赖该异常编写程序确定其正确性是不对的:fail-fast行为应被用于检测bugs。
下面开始代码部分
2. 几个类常量属性:
DEFAULT_INITIAL_CAPACITY: 默认初始容量,值为2的4次方,必须为2的幂次方(最后解释)。
MAXIMUM_CAPACITY: 最大容量, 2的30次方,同时也要求是2的次幂。
DEFAULT_LOAD_FACTOR:默认负载因子,为0.75
TREEIFY_THRESHOLD:将链表转化为树的阈值,当链表的长度达到8时转换为红黑树。
UNTREEIFY_THRESHOLD:当树中节点数小于等于6时,将其转换为链表。(之前看到有人说这个和TREEIFY_THRESHOLD之间差2的原因是:防止树形化和链表化的操作太频繁,我觉有点道理)
MIN_TREEIFY_CAPACITY:capacity 大于等于 这个值(64)时,才可以进行树形化,小于该值时,可能出现一个bucket 里的节点过多,此时应该将capacity重新设置,而不是进行树形化。
一个静态内部类,用于链表及红黑树。
3. HashMap的构造方法:有4个
- 含有初始容量initialCapacity和负载因子loadFactor两个参数的构造方法,这里就不解释代码了
- 只有初始容量一个参数的构造方法,这里调用了上一个构造方法,负载因子设置为默认值0.75
- 无参构造方法(常用)
- 含有一个Map类型参数的构造方法,将一个map复制给新的hash map。
4. put, hash, get, remove 方法 - put()方法:向HashMap对象中添加键值对,可以看到其调用了hash()方法计算key的hash值,并调用了putVal()方法。
- hash()方法,将key.hashCode()结果的高16位及低16为进行了异或。
注释中给出了原因,最后使用hash值确定该key的key-value对应的数组位置时,会使用 hash&(n-1)。 如果直接使用两个key的hashCode()进行操作,很容易发生hash冲突,将高低位异或会减少这种事情的发生。
如:假设两个key: key1,key2
,其hashCode()值如下
虽然key1 != kye2
, 但是key1& (n-1) == key2 & (n-1)
但是hash操作后:hash(key1)
变为:
hash(key2)
:
hash(key1) & (n-1)
:
hash(key2) & (n-1)
:
明显结果不同了,这样就减少了hash冲突。
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
put大致步骤如下:
首先需要确保数组是存在的
之后,确定数组索引hash & (n-1)位置上是否为null,即判断是否要放入的key的位置上有hash冲突。
若数组该位置为空,生成一个含有key-value信息的Node,放入数组该位置。
不为空:
若该头节点key与要插入的key相同,则不用新生成Node,直接对该节点进行修改即可
若不同:
若头节点为树节点,则调用putTreeVal方法进行操作
若头节点为链表节点,遍历链表;
若链表中无该key,则生成一个Node
若有,直接对该节点进行修改即可
完成上述操作后:
++modCount(修改次数)
同时需要判断是否需要扩容。
- get方法:
其调用getNode()
方法:
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
get 方法大致步骤:
需要确定数组存在,不为空且key对应的索引位中不为空
若数组中的头节点Node的key与key相同,直接返回该节点即可
头节点key与key不同:
若头节点为树节点,则调用getTreeNode方法,在树中查询
头节点为链表节点,遍历链表,存在key的节点则返回该节点
完成上述操作,若没有key的节点,则返回null.
- remove 方法
调用了removeNode方法
final Node<K,V> removeNode(int hash, Object key, Object value,
boolean matchValue, boolean movable) {
Node<K,V>[] tab; Node<K,V> p; int n, index;
if ((tab = table) != null && (n = tab.length) > 0 &&
(p = tab[index = (n - 1) & hash]) != null) {
Node<K,V> node = null, e; K k; V v;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
node = p;
else if ((e = p.next) != null) {
if (p instanceof TreeNode)
node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
else {
do {
if (e.hash == hash &&
((k = e.key) == key ||
(key != null && key.equals(k)))) {
node = e;
break;
}
p = e;
} while ((e = e.next) != null);
}
}
if (node != null && (!matchValue || (v = node.value) == value ||
(value != null && value.equals(v)))) {
if (node instanceof TreeNode)
((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
else if (node == p)
tab[index] = node.next;
else
p.next = node.next;
++modCount;
--size;
afterNodeRemoval(node);
return node;
}
}
return null;
}
removeNode方法步骤:
确定数组存在不为空且(hash & n-1)处有Node:
首先要找到key的节点:
若头节点key为key:node = p
否则:
若头节点为树节点:调用getTreeNode方法
头节点为链表节点:遍历链表找key的Node
找到节点,若matchValue为true需要确定value值也相同(但是matchValue默认值为false,因此这里不用考虑value值)
若节点为树节点:调用removeTreeNode方法
为头节点:删除头节点
为链表节点:删除该节点
最后说一下为什么capacity,即数组容量为2的次幂?
原因很简单:保证hash & (capacity-1)可以落在 0~capacity-1上。