未完待续。。。
目录
HashMap容量
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
默认初始化容量是16。
HashMap
的容量一定是2
的n
次幂,这样做的原因是:
HashMap
通过hash%n
找到bin
的位置,当n
为2
的次幂时,有hash%n=hash&(n-1)
,位运算与%相比,能提高运算效率。
当HashMap
扩容时,容量变为原来的两倍,元素的所在bin
位置取决于它的hashCode
值扩展出的一位是0还是1,0的话不变,1的话就是原索引值+oldCap
。
HashMap底层数据结构
- JDK1.8
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
...
}
/**
* Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
* extends Node) so can be used as extension of either regular or
* linked node.
*/
static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
TreeNode<K,V> parent; // red-black tree links
TreeNode<K,V> left;
TreeNode<K,V> right;
TreeNode<K,V> prev; // needed to unlink next upon deletion
boolean red;
...
}
数组+链表+红黑树
HashMap
的key
的hashCode
是继承自Object
的,一般来说需要重写。
- 通过
hashCode
的扰动函数(hashCode
的高16位与低16位做异或)得到hash
值; - 通过
hash%n
(即hash&(n-1),在n是2的幂时等式成立)得到该元素应该存放的数组位置; - 如果该位置存在元素,就需要判断新加入的元素的
hash
和key
是否跟已存在的相同,相同就覆盖,不同就添加到末尾; - 当链表长度大于阈值(默认为 8)时,将链表转化为红黑树(数组长度需大于64,如不满足则会
resize
)
扰动函数的意义(为何不直接使用hashCode)
jdk注解:
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don’t benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
扰动函数是hashCode
的高16位与低16位做异或(两次扰动)。
注释中说这样做是:
① 在速度,效用和位扩展质量之间的权衡。
② 混合高位与低位,加强低位的随机性。
③ 如果不混合高位与低位,由于表的限制(初始为16),高位将永远不会参与索引计算。
④ 由于许多常见的hash
集的合理分布和我们使用了树来处理bin
(bin是表中的一个元素,注释中使用了bin
,也可以理解为bucket
)中的冲突,因此我们使用一种最廉价的方式(即一次位运算+一次异或)来减少系统性损失(这里我个人理解为hash碰撞)。
loadFactor默认值为什么是0.75?
在HashMap
源码的开头有一段注释:
As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).
这里解释了为什么负载因子 loadfactor
采用0.75。它是考虑了时间和空间的折衷。较高的值会减少空间开销,但会增加查找成本。
注释中的泊松分布想说明什么?
在HashMap
源码的 Implementation notes 中,有一段注释:
Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
首先是,在良好分布的hashCodes
中,树是很少被使用的。理想情况下的随机hashCodes
,采用0.75的loadfactor
,该bin
(应该可以翻译为桶吧)的拥有k个节点的概率服从参数约为0.5的泊松分布,从注解中我们可以看到k=8时的概率非常小,也就是说转化为红黑树的概率很小。如果转化为红黑树,很有可能节点足够多,此时使用红黑树是值得的。
ConcurrentHashMap
参考: