HashMap中有哪些常量?这些常量设计的目的是什么?本篇带你走近Doug Lea、Josh Bloch、Arthur van Hoff、 Neal Gafter对HashMap的设计。(以下都是基于jdk1.8)
常量设计 |
(1)HashMap默认初始化大小是1 << 4(即16)
/**
* The default initial capacity - MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
关于这个变量,注释说“MUST be a power of two”,即必须是2的幂次方。为什么一定要是2的幂次方呢?
HashMap底层数据结构是数组+链表(或数组+红黑树),当添加元素时,索引定位使用的是i =(n - 1) & hash ,当初始化大小n是2的幂次方时,它就等价于 n % hash 。定位下标一般用取余法,而按位与(&)运算的效率要比取余(%)运算的效率高,所以默认初始化大必须为2的幂次方,就是为了使用更高效的与运算。
默认初始化大小为什么是16而不是8或者32?如果太小,扩容比较频繁;如果太大,又占用内存空间。这算是jdk为我们做的初始权衡吧。
(2)HashMap最大容量是1<<30,即2的30次方
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
我们知道int是占4个字节,一个字节是8位,所以说是32位整型,那按理说可以左移31位,即2的31次幂。在这里为什么不是2的31次方呢?实际上,二进制数的最左边那一位是符号位,用来表示正负的。我们来看下面的例子:
System.out.println(1 << 30);
System.out.println(1 << 31);
System.out.println(1 << 32);
System.out.println(1 << 33);
输出:
1073741824
-2147483648
1
2
所以,HashMap的最大容量就是2的30次方。
(3)HashMap默认加载因子是0.75
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
HashMap表征hash表的填满程度,让我们看一下源码对load factor的解释:
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
通常来说,加载因子的默认值0.75在时间性能和空间消耗之间达到了平衡。较高的值虽然降低了空间消耗,但是却增加了查找时间(反映在HashMap大多数的操作上,包括get和put)。当设置初始容量的时候,应该考虑将要放入map中的元素数量和加载因子,以减少rehash的次数。如果初始的容量比预计的entry数量除以加载因子的商还要大,那么永远不需要rehash操作。
(4)HashMap默认树化(链表转换成红黑树)阈值是8
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
Java8及以后的版本中,HashMap底层数据结构引入了红黑树,当添加元素的时候,如果桶中链表元素超过8,会自动转为红黑树。那么阈值为什么是8呢?来看HashMap源码中的这段注释:
* Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
理想状态中,在随机哈希码情况下,对于默认0.75的加载因子,桶中节点的分布频率服从参数约为0.5的泊松分布,即使粒度调整会产生较大方差。从数据中可以看到链表中元素个数为8时的概率非常非常小了,所以链表转换红黑树的阈值选择了8。
(5)HashMap中一个树的链表还原阈值是6
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
链表树化阀值是8,那么树还原为链表为什么是6而不是7呢?这是为了防止链表和树之间频繁的转换。如果是7的话,假设一个HashMap不停的插入、删除元素,链表个数一直在8左右徘徊,就会频繁树转链表、链表转树,效率非常低下。
(5)HashMap的最小树化容量是64
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;
为什么是64呢?这是因为容量低于64时,哈希碰撞的机率比较大,而这个时候出现长链表的可能性会稍微大一些,这种原因下产生的长链表,我们应该优先选择扩容而避免不必要的树化。
参考链接: