JAVA中为什么Map桶（链表）长度超过8才转为红黑树

最新推荐文章于 2025-05-09 22:19:37 发布

置顶 stuqbx

最新推荐文章于 2025-05-09 22:19:37 发布

阅读量1.3k

点赞数 4

分类专栏： JAVA 源码解读文章标签： JAVA 基础 HashMap Map ConcurrentHashMap

本文为博主原创文章，请尊重原创，未经博主允许禁止转载，保留追究权

本文链接：https://blog.csdn.net/stuqbx/article/details/88897621

版权

JAVA 同时被 2 个专栏收录

10 篇文章

订阅专栏

源码解读

3 篇文章

订阅专栏

探讨了HashMap中链表转换为红黑树的原因及其阈值设定为8的理由。通过概率统计和空间利用分析，解释了这一设计如何平衡了查找效率与存储成本。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

为什么要转换？

因为Map中桶的元素初始化是链表保存的，其查找性能是O(n)，而树结构能将查找性能提升到O(log(n))。当链表长度很小的时候，即使遍历，速度也非常快，但是当链表长度不断变长，肯定会对查询性能有一定的影响，所以才需要转成树。

为什么阈值是8？

转换后存储的数据结构TreeNodes占用空间是普通Nodes的两倍，只有当bin包含足够多的节点时才会转成TreeNodes，而是否足够多是由TREEIFY_THRESHOLD的值决定的。

在hashCode离散性很好的情况下，树型bin（桶，即bucket，HashMap中hashCode值一样的元素保存的地方）用到的概率非常小，因为数据均匀分布在每个bin中，几乎不会有bin中链表长度会达到阈值。事实上，在随机hashCode的情况下，在bin中节点的分布频率遵循如下的泊松分布（http://en.wikipedia.org/wiki/Poisson_distribution）。

在扩容阈值为0.75的情况下，（即使因为扩容而方差很大）遵循着参数平均为0.5的泊松分布。忽略方差，按公式
在这里插入图片描述
计算，概率如下：

长度	概率
0	0.60653066
1	0.30326533
2	0.07581633
3	0.01263606
4	0.00157952
5	0.00015795
6	0.00001316
7	0.00000094
8	0.00000006

如上，一个bin中链表长度达到8个元素的概率为0.00000006，几乎是不可能事件。

大部分情况下，链表存储能节约存储空间同时有着良好的查找性能；极个别情况下，节点数达到8个，转为红黑树，能获得更好的查找性能，同时因为是个别情况，不需要大量的存储空间。

所以，阈值8是时间和空间的权衡，是根据概率统计决定的。不得不感叹，发展30年的Java每一项改动和优化都是非常严谨和科学的。

附. JDK(1.8.0_45)中的相关注释

HashMap类第174～197行

     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

ConcurrentHashMap中第327~349行也有关于此的说法，大同小异。

     * The main disadvantage of per-bin locks is that other update
     * operations on other nodes in a bin list protected by the same
     * lock can stall, for example when user equals() or mapping
     * functions take a long time.  However, statistically, under
     * random hash codes, this is not a common problem.  Ideally, the
     * frequency of nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average, given the resizing threshold
     * of 0.75, although with a large variance because of resizing
     * granularity. Ignoring variance, the expected occurrences of
     * list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
     * first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million