为什么HashMap桶长度超过8才会转换成红黑树

同时发布:https://daiyuhe.com/bucket-convert-to-red-black-tree-when-8-size/

百度了一下,感觉能说清楚的并不多,其中有一个非常多的回答是以链表和红黑树的平均查找长度来解释,但是这种解释的所有回答中都没有提到出处,始终让人不能彻底相信。通过查阅官方文档和源码之后,在这里说说我的理解。

HashMap中的注释有一段Implementation notes,大致说明了我们的问题,我们来看看。

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same “class C implements Comparable<C>”, type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this – see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

这一段说明了为什么要使用红黑树:

红黑树的插入、删除和遍历的最坏时间复杂度都是log(n),因此,在意外或者恶意使用导致hashCode()方法返回值的分布很糟糕,以及在那些许多key共享一个hashCode的情况下,只要Key具有可比性,性能的下降将会是"优雅"的。(如果这两种方法都不适用,与不采取预防措施相比,我们可能会浪费大约两倍的时间和空间。但目前所知的唯一案例来自于糟糕的用户编程实践,这些实践已经非常缓慢,以至于没有什么区别。)

接下来看另外一段:

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:

0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006
more: less than 1 in ten million

这一段解释了我们的问题:

由于TreeNodes的大小大约是常规节点的两倍,因此我们仅在容器包含足够的节点以保证使用时才使用它们(参见 TREEIFY_THRESHOLD 值)。当它们变得太小(由于移除或调整大小)时,它们会被转换回普通的bin。理想情况下,在随机哈希代码下,bin中的节点频率遵循泊松分布,下面就是list size k 的频率表。

结论
由频率表可以看出,桶的长度超过8的概率是非常小的,结合上一段提到的恶意使用,我想,加入红黑树的其中一个主要原因是为了防止恶意使用带来的性能骤降。作者根据概率统计而选择了8作为阀值,由此可见,将阈值定为8这个选择是非常严谨和科学的一个决定。

  • 0
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 12
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值