- 本文为JDK1.8的HashMap源码分析
HashMap特点
- 允许空键和空值
- 不保证映射顺序,输出的顺序和输入时的不相同(如果要保证相同,可以选择LinkedHashMap)
- 尽可能的将元素平均分散在桶里( “buckets”),实现为get/put操作提供恒定时间的性能
- 遍历操作所需要的时间和桶的容量(table的大小)及其大小(key-value键值对的个数)成正比,因此桶的初始容量不能过高,负载因子不能过低
- 初始容量(initial capacity):哈希表中存储桶的数量,创建哈希表时的容量,必须是2的n次幂
- 负载因子(load factor):表示一个哈希表的空间的使用程度,initailCapacity*loadFactor=HashMap的大小
- 负载因子越大则散列表的装填程度越高,也就是能容纳更多的元素,元素多了,链表大了,所以此时索引效率就会降低,反之,负载因子越小则链表中的数据量就越稀疏,此时会对空间造成烂费,但是此时索引效率高
- 线程不安全
HashMap定义
public class HashMap<K,V> extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable {
- Cloneable:标记接口,只有实现这个接口后,然后在类中重写Object中的clone()方法,然后通过类调用clone方法才能克隆成功,如果不实现这个接口,则会抛出CloneNotSupportedException(克隆不支持)异常。
- Serializable:标识接口,标识这该类可序列化及反序列化。
HashMap数据结构
- 默认配置参数
/**
* The default initial capacity - MUST be a power of two.
*/
//默认初始化容量:16 ,必须是2的n次幂
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
//最大容量:2的30次方
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The load factor used when none specified in constructor.
*/
// 负载因子 0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
// 链表转成红黑树的阈值,在存储数据时,当链表长度 > 8 时,则将链表转换成红黑树
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
//当原有的红黑树内数量 < 6 时,则将 红黑树转换成链表
static final int UNTREEIFY_THRESHOLD = 6;
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
// 当哈希表中的容量 > 该值时,才允许树形化链表 (即 将链表 转换成 红黑树)
// 否则,若桶内元素太多时,则直接扩容,而不是树形化
// 为了避免进行扩容、树形化选择的冲突,这个值不能小于 4 * TREEIFY_THRESHOLD
static final int MIN_TREEIFY_CAPACITY = 64;
- 存储结构:
- HashMap 内部包含了一个 Node 类型的数组 table,根据hash值确定数组下标,Node有个next字段,采用拉链法来解决冲突,相同的hash值在同一个链表中。
transient Node<K,V>[] table;
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final boolean equals(Object o) {
if(o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
- 红黑树节点数据结构:HashMap在JDK1.8中引入了红黑树,当链表长度 > TREEIFY_THRESHOLD=8时,链表将转换成红黑树
/**
* Entryfor Tree bins. Extends LinkedHashMap.Entry (which in turn
* extends Node) so can be used as extension of either regular or
* linked node.
*/
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
TreeNode(int hash, K key, V val, Node<K,V> next) {
super(hash, key, val, next);
}
}
static class Entry<K,V> extends HashMap.Node<K,V> {
Entry<K,V> before, after;
Entry(int hash, K key, V value, Node<K,V> next) {
super(hash, key, value, next);
}
}
- 看上面的继承关系,发现TreeNode是Node的子类
小思考:
为什么TreeNode不直接继承Node?
- 相关成员变量
/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
// threshold = capacity * load factor,当size大于这个值,就需要resize了
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
HashMap构造方法
- 空构造,设置默认初始容量为16,负载因子为0.75,注意,这里没有创建table数组
在构造函数中,我们可以看到,HashMap并没有创建table数组,只是初始化了容量和负载因子
也就是说,实际创建table数组是在后面put()操作时完成的。
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
- 指定初始容量的构造方法
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
- 指定初始容量及负载因子的构造方法
/**
* Constructsan empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
tableSizeFor()是计算出不小于initialCapacity的最小的2的幂的结果,设计的很巧妙,来看看如何实现的
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
- 如果cap是2的次幂,那么先减1,最后再加1,return的还是cap本身
- 如果cap是1,那么n是0,无论n怎么操作,最后都是0,再加1,返回的还是cap
- 如果cap是0或者是小于0的数,那么n是负数,无符号右移再或操作,n还是负数,最后return的是1
- 如果cap大于1,即n大于0,tableSizeFor()计算如下:
- n不为0,至少有一位为1,假设第i位为n的二进制表示中为1的最高一位
- 无符号右移1位后,第i-1位为1,再与n进行或操作,那么第i位和第i-1位为1,也就是说至少有2位为1,最高位和第二高位值必定为1
- 无符号右移2位后,第i-2和第i-3位为1,再与n进行或操作,那么第i、i-1、i-2、i-3位为1,也就是说至少有4位为1,最高的4位值为1
- 以此类推,无符号右移16位后,最多也就32个1,同时可以注意到
MAXIMUM_CAPACITY = 1 << 30
,最多30个1 - 所以这样操作之后,从第i位开始后面的值全为1
- 最后再加1,得到的就是不小于initialCapacity的最小2的整数次幂
小思考:
1、前面两个构造方法就没有对threshold赋值,为什么这个构造方法就要赋值呢
2、应该是threshold = tableSizeFor(initialCapacity) * loadFactor;
为什么这里直接为threshold = tableSizeFor(initialCapacity);
- 使用指定的Map构造一个HashMap
/**
* Constructs a new <tt>HashMap</tt> with the same mappings as the
* specified <tt>Map</tt>. The <tt>HashMap</tt> is created with
* default load factor (0.75) and an initial capacity sufficient to
* hold the mappings in the specified <tt>Map</tt>.
*
* @param m the map whose mappings are to be placed in this map
* @throws NullPointerException if the specified map is null
*/
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
添加元素
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
首先看看table数组是如何创建的
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
- 会先判断table是否为空,为空时调用resize()方法,返回值赋值给tab数组,具体如何实现的,后面扩容部分再分析
如何根据key计算出键值对要存放的数组下标
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
// 数组下标
i = (n - 1) & hash; // n为table的length,是2的整数次幂
- HashMap允许空键,如果键为null,就放到table下标为0的位置上
- 对于非null的键,计算数组下标,可以转换为下面代码
public int indexOf(Object key) {
int h = key.hashCode();
return (h ^ (h >>> 16)) & (length - 1);
}
为什么不直接
h & (length - 1)
,h >>> 16
是什么,有什么用呢
文档上给出的解释是 Computes key.hashCode() and spreads (XORs) higher bits of hash to lower.
将hashCode的高16位扩展到低16位,其目的也是为了让hash更散列
h >>> 16
无符号右移16位,得到h的高16位,然后和h进行异或操作,可以看作是h的高16位和h的低位进行异或操作,这样高低位数据权重保留
1010 1010 0001 0100 1111 0101 h=11146485
0000 0000 0000 0000 1010 1010 >>>16
1010 1010 0001 0100 0101 1111 ^ //高低位数据权重保留
1111 1111 1111 1111 1111 1111 (length -1)=16777215
1010 1010 0001 0100 1111 0101 & 结果=11146485//高低位数据的变化影响都有保留,尽可能地离散
接着后面就是一个if…else…,检查要插入结点的位置是否为空
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
// 发生碰撞,解决冲突
}
i = (n - 1) & hash
是根据hash来计算出数组下标- 如果插入节点位置为空,直接创建Node节点,放入table中,创建节点newNode()
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
return new Node<>(hash, key, value, next);
}
- 当节点位置不为空时,发生碰撞,解决冲突
当发生碰撞,如何解决冲突
- 产生了冲突,那么有两种情况:key相同,key不同
- 如何判断key是否相同
- 首先key的hash相同
- 其次,key是同一个对象或者key调用
equals()
方法为true
// p是已存在的节点,key是新节点的key
p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))
- 如果p是TreeNode的实例,说明p下面是红黑树,需要在树中找到一个合适的位置插入
- p下面的结点数未超过8,则以单向链表的形式存在,逐个往下判断:
- ①如果下一个位为空,插入链表尾部,并且判断当插入后容量超过8则转化成红黑树。
- ②如果下一个位有相等的hash值,则覆盖value(节点还是同一个节点,只修改了value值)。
// p = tab[i = (n - 1) & hash]
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p; // e存放的是key相同的那个节点
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);//不存在key,新节点插到尾部
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);// 当插入后容量超过8,链表转化为红黑树
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
// key已经存在
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
链表如何转换为红黑树
- 如果元素数组为空 或者 数组长度小于 树结构化的最小限制(MIN_TREEIFY_CAPACITY = 64),此时可以通过扩容来使元素散列更均匀,不需要转换红黑树
- 当数组长度大于64(且链表长度大于8)时,要将链表转换成红黑树结构
- 树形化
- 首先要先将Node节点转换为TreeNode
- 然后单链表转换为双链表
- 然后将双链表转换为红黑树
/**
* Replaces all linked nodes in bin at index for given hash unless
* table is too small, in which case resizes instead.
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
// 下面的循环是遍历链表,将单链表Node变成双链表TreeNode
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);
// 将头节点放到table里
if ((tab[index] = hd) != null)
hd.treeify(tab); //头节点调用treeify方法,从该节点开始将其转换为树形结构
}
}
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
return new TreeNode<>(p.hash, p.key, p.value, next);
}
最后判断size大小是否超过扩容阈值
++modCount; // 这个是修改的标识
if (++size > threshold)
resize();
- 插入元素后,++size
- 先插入元素,再进行扩容判断(注意:JDK1.7是先扩容,再插入元素)
resize()创建table或扩容
- 计算新的容量 newCap
- 计算新的扩容阈值 newThr
- 创建新table newTab
- 将旧table中的元素重新放入新table里
- 遍历table数组
- 遍历链表或红黑树
- 每个非空节点,先设置为null,然后重新计算新table数组下标,插入到新table中,(位置不变或索引+旧容量大小)
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
小思考:扩容的时候为什么1.8 不用重新hash就可以直接定位原节点在新数据的位置呢?
这是由于扩容是扩大为原数组大小的2倍,用于计算数组位置的掩码仅仅只是高位多了一个1
扩容前长度为16,用于计算(n-1) & hash 的二进制n-1为0000 1111,扩容为32后的二进制就高位多了1,为0001 1111。
因为是& 运算,1和任何数 & 都是它本身,那就分二种情况,如下图:原数据hashcode高位第4位为0和高位为1的情况;
第四位高位为0,重新hash数值不变,第四位为1,重新hash数值比原来大16(旧数组的容量)
调用resize()的时机
- 初始化后第一次put()操作插入元素时,此时只是创建table数组
- 插入元素后,如果size超过扩容阈值(
threshold = capacity * loadFactor
),扩容 - 当链表元素超过8个时,元素数组为空 或者 数组长度小于 树结构化的最小限制(MIN_TREEIFY_CAPACITY = 64),此时可以通过扩容来使元素散列更均匀,不需要转换红黑树
putAll(map)
- 创建一个新的Map结构,使用putAll()方法把原先的Map添加到新的Map中
- 遍历原先的Map,将每个元素放入新的Map里,元素的值还是原先Map中元素对象的引用,并不是新创建key,value对象放入新Map里
public void putAll(Map<? extends K, ? extends V> m) {
putMapEntries(m, true);
}
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
int s = m.size();
if (s > 0) {
if (table == null) { // pre-size
float ft = ((float)s / loadFactor) + 1.0F;
int t = ((ft < (float)MAXIMUM_CAPACITY) ?
(int)ft : MAXIMUM_CAPACITY);
if (t > threshold)
threshold = tableSizeFor(t);
}
else if (s > threshold) // 这里不是 s + size() > threshold,因为JDK8中是在元素put进去之后才会扩容的
resize();
for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
K key = e.getKey();
V value = e.getValue();
putVal(hash(key), key, value, false, evict);
}
}
}
JDK1.8较1.7的优化
1.8还有三点主要的优化:
- 1、hash函数做了优化,1.7做了四次移位和四次异或,1.8对此做了简化
JDK1.7的hash函数如下:
static int hash(int h) {
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
JDK1.8:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
- 2、数组+链表改成了数组+链表或红黑树;
- 防止发生hash冲突,链表长度过长,将时间复杂度由O(n)降为O(logn);
- 3、链表的插入方式从头插法改成了尾插法,简单说就是插入时,如果数组位置上已经有元素,1.7将新元素放到数组中,原始节点作为新节点的后继节点,1.8遍历链表,将元素放置到链表的最后;
- 因为1.7头插法扩容时,头插法会使链表发生反转,多线程环境下会产生环;
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null != e) {
Entry<K,V> next = e.next;
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i]; //A线程如果执行到这一行挂起,B线程开始进行扩容
newTable[i] = e;
e = next;
}
}
}
-
A线程在插入节点B,B线程也在插入,遇到容量不够开始扩容,重新hash,放置元素,采用头插法,后遍历到的B节点放入了头部,这样形成了环,如下图所示:
-
扩容的时候1.7需要对原数组中的元素进行重新hash定位在新数组的位置,1.8采用更简单的判断逻辑,位置不变或索引+旧容量大小;
-
在插入时,1.7先判断是否需要扩容,再插入,1.8先进行插入,插入完成再判断是否需要扩容;