1. 扩容因子为什么是0.75?
static final float DEFAULT_LOAD_FACTOR = 0.75f;
- 源码中给出的解释如下:负载因子是0.75的时候,空间利用率比较高,而且避免了相当多的Hash冲突,使得底层的链表或者是红黑树的高度比较低,提升了空间效率。
- 给出的扩容因子过高,提升了空间占用,但 hash 冲突比较严重,影响查找效率。过低查找效率高,但空间占用低,0.75 是进行多次试验后的最佳均衡点。
/*
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur. */
2. 为什么链条长度超过8转化为红黑树?
static final int TREEIFY_THRESHOLD = 8;
- 这里还是想说一下,在容量大于最小树化容量(64)时,如果使用 put 方法,链表 >= 9 转化为红黑树,如果使用 compute 方法,链表长度 >= 8 则会转化为红黑树。有兴趣的可以在我的 HashMap 源码讲解篇。
- 说回正题,为什么呢?
- 大致意思是,TreeNodes 的大小是 Node 的两倍,所以在链表长度较小时,不会使用红黑树。在理想情况下,hash的散列值服从泊松分布,链表长度为 n 的概率如注释,为 8 的概率微乎其微,不必在往下进行调整。
/*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*/
3. 为什么链条长度为6会由红黑树退化成链条?
static final int UNTREEIFY_THRESHOLD = 6;
- 都说是减少存储消耗,所以要转化成链表存储。为什么不是 7 ?留有空间余地,避免链表与红黑树的频繁切换。
4. 为什么默认容量为 16?
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
- 首先,容量必须为 2 的指数次值。因为在进行定位桶位的时候是使用 &运算,就是你指定的容量不是 2 指数次值,也会使用 tableSizeFor() 扩容到大于且最接近的值。
- 为什么不是 4 或者 8 呢?太小的容量会造成多次扩容操作,得不偿失。