Why hash maps in Java 8 use binary tree instead of linked list?

最新推荐文章于 2024-07-27 17:56:15 发布

weixin_33835103

最新推荐文章于 2024-07-27 17:56:15 发布

阅读量119

点赞数

文章标签： java 数据结构与算法开发工具

原文链接：https://my.oschina.net/u/2935389/blog/3041636

版权

2019独角兽企业重金招聘Python工程师标准>>>

Q: I recently came to know that in Java 8 hash maps uses binary tree instead of linked list and hash code is used as the branching factor.I understand that in case of high collision the lookup is reduced to O(log n) from O(n) by using binary trees.My question is what good does it really do as the amortized time complexity is still O(1) and maybe if you force to store all the entries in the same bucket by providing the same hash code for all keys we can see a significant time difference but no one in their right minds would do that.

Binary tree also uses more space than singly linked list as it stores both left and right nodes.Why increase the space complexity when there is absolutely no improvement in time complexity except for some spurious test cases.

我最近才知道在Java 8哈希映射中使用二叉树而不是链表，并使用哈希代码作为分支因子。我知道在高冲突的情况下，查找从 O（n）减少到O（log n）通过使用二叉树。我的问题是它真正做了什么好处，因为摊销的时间复杂度仍然是 O（1）并且如果你强制通过为所有键提供相同的哈希码来存储同一桶中的所有条目可以看到一个显着的时间差异，但没有一个人在他们正确的思想中会这样做。二进制树比单链表使用更多空间，因为它存储左右节点。当除了一些虚假测试用例之外，当时间复杂度完全没有改善时，为什么增加空间复杂度。

A: This is mostly security-related change. While in normal situation it's rarely possible to have many collisions, if hash keys arrive from untrusted source (e.g. HTTP header names received from the client), then it's possible and not very hard to specially craft the input, so the resulting keys will have the same hashcode. Now if you perform many look-ups, you may experience denial-of-service. It appears that there's quite a lot of code in the wild which is vulnerable to this kind of attacks, thus it was decided to fix this on the Java side.

For more information refer to JEP-180.

这主要是与安全相关的变化。虽然在正常情况下很少有可能发生很多冲突，如果哈希密钥来自不受信任的来源（例如从客户端收到的HTTP头名称），那么可能并且不是很难专门设计输入，因此生成的密钥将具有相同的哈希码。现在，如果您执行许多查找，您可能会遇到拒绝服务。似乎在野外有相当多的代码容易受到这种攻击，因此决定在Java端解决这个问题。

有关更多信息，请参阅JEP-180。

PS（参考原文）：

在设计hash函数时，因为目前的table长度n为2的幂，而计算下标的时候，是这样实现的(使用&位操作，而非%求余)：

(n - 1) & hash

设计者认为这方法很容易发生碰撞。为什么这么说呢？不妨思考一下，在n – 1为15(0×1111)时，其实散列真正生效的只是低4bit的有效位，当然容易碰撞了。

因此，设计者想了一个顾全大局的方法(综合考虑了速度、作用、质量)，就是把高16bit和低16bit异或了一下。设计者还解释到因为现在大多数的hashCode的分布已经很不错了，就算是发生了碰撞也用O(logn)的tree去做了。仅仅异或一下，既减少了系统的开销，也不会造成的因为高位没有参与下标的计算(table长度比较小时)，从而引起的碰撞。

如果还是产生了频繁的碰撞，会发生什么问题呢？作者注释说，他们使用树来处理频繁的碰撞(we use trees to handle large sets of collisions in bins)，在JEP-180中，描述了这个问题：

Improve the performance of java.util.HashMap under high hash-collision conditions byusing balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

之前已经提过，在获取HashMap的元素时，基本分两步：

首先根据hashCode()做hash，然后确定bucket的index；
如果bucket的节点的key不是我们需要的，则通过keys.equals()在链中找。

在Java 8之前的实现中是用链表解决冲突的，在产生碰撞的情况下，进行get时，两步的时间复杂度是O(1)+O(n)。因此，当碰撞很厉害的时候n很大，O(n)的速度显然是影响速度的。

因此在Java 8中，利用红黑树替换链表，这样复杂度就变成了O(1)+O(logn)了，这样在n很大的时候，能够比较理想的解决这个问题，在Java 8：HashMap的性能提升一文中有性能测试的结果

JEP 180: Handle Frequent HashMap Collisions with Balanced Trees

Author Mike Duigou
Owner Brent Christian
Type Feature
Scope Implementation
Status Closed / Delivered
Release 8
Component core-libs
Discussion core dash libs dash dev at openjdk dot java dot net
Effort M
Duration M
Reviewed by Alan Bateman
Endorsed by Brian Goetz
Created 2013/02/08 20:00
Updated 2017/06/14 18:44
Issue 8046170

Summary

Improve the performance of java.util.HashMap under high hash-collision conditions by using balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

Motivation

Earlier work in this area in JDK 8, namely the alternative string-hashing implementation, improved collision performance for string-valued keys only, and it did so at the cost of adding a new (private) field to every String instance.

The changes proposed here will improve collision performance for any key type that implements Comparable. The alternative string-hashing mechanism, including the private hash32 field added to the String class, can then be removed.

Description

The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).

This technique has already been implemented in the latest version of thejava.util.concurrent.ConcurrentHashMap class, which is also slated for inclusion in JDK 8 as part of JEP 155. Portions of that code will be re-used to implement the same idea in the HashMap and LinkedHashMap classes. Only the implementations will be changed; no interfaces or specifications will be modified. Some user-visible behaviors, such as iteration order, will change within the bounds of their current specifications.

We will not implement this technique in the legacy Hashtable class. That class has been part of the platform since Java 1.0, and some legacy code that uses it is known to depend upon iteration order. Hashtable will be reverted to its state prior to the introduction of the alternative string-hashing implementation, and will maintain its historical iteration order.

We also will not implement this technique in WeakHashMap. An attempt was made, but the complexity of having to account for weak keys resulted in an unacceptable drop in microbenchmark performance. WeakHashMap will also be reverted to its prior state.

There is no need to implement this technique in the IdentityHashMap class. It uses System.identityHashCode() to generate hash codes, so collisions are generally rare.

Testing

Run Map tests from Doug Lea's JSR 166 CVS workspace (includes a couple microbenchmarks)
Run performance tests of standard workloads
Possibly develop new microbenchmarks

Risks and Assumptions

This change will introduce some overhead for the addition and management of the balanced trees; we expect that overhead to be negligible.

This change will likely result in a change to the iteration order of the HashMap class. The HashMap specification explicitly makes no guarantee about iteration order. The iteration order of the LinkedHashMap class will be maintained.

转载于:https://my.oschina.net/u/2935389/blog/3041636

weixin_33835103

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Why hash maps in Java 8 use binary tree instead of linked list?

2019独角兽企业重金招聘Python工程师标准>>> ...
复制链接

扫一扫

Author	Mike Duigou
Owner	Brent Christian
Type	Feature
Scope	Implementation
Status	Closed / Delivered
Release	8
Component	core-libs
Discussion	core dash libs dash dev at openjdk dot java dot net
Effort	M
Duration	M
Reviewed by	Alan Bateman
Endorsed by	Brian Goetz
Created	2013/02/08 20:00
Updated	2017/06/14 18:44
Issue	8046170