HashMap底层方法描述译文---java.util.HashMap

最新推荐文章于 2022-11-23 22:33:14 发布
走上删库之路
最新推荐文章于 2022-11-23 22:33:14 发布
阅读量361
点赞数
文章标签： map hashmap.java map译文
本文链接：https://blog.csdn.net/qq_42719318/article/details/101351149
版权
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

基于哈希表的映射接口实现。此实现提供所有可选的映射操作，并允许空值和空键。（hashmap类大致等同于hashtable，只是它不同步并且允许空值。）这个类不保证映射的顺序；特别是，它不保证顺序随时间保持不变。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

此实现为基本操作（get和put）提供恒定的时间性能，假设哈希函数在存储桶之间正确地分散元素。集合视图上的迭代需要与hashmap实例的“容量”（bucket的数量）加上其大小（键值映射的数量）成比例的时间。因此，如果迭代性能很重要的话，不要设置太高的初始容量（或者太低的负载系数）。

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

hashmap实例有两个影响其性能的参数：初始容量和负载因子。容量是哈希表中存储桶的数量，初始容量只是创建哈希表时的容量。负载因子是在哈希表的容量自动增加之前，允许哈希表获得的满容量的度量。当哈希表中的条目数超过加载因子和当前容量的乘积时，哈希表将重新灰化（即重建内部数据结构），以便哈希表具有大约两倍的存储桶数。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

一般来说，默认加载因子（.75）在时间和空间成本之间提供了一个很好的折衷。较高的值会减少空间开销，但会增加查找成本（反映在hashmap类的大多数操作中，包括get和put）。在设置初始容量时，应考虑MAP中的预期条目数及其负载系数，以尽量减少再冲操作次数。如果初始容量大于最大条目数除以负载系数，则不会发生再刷新操作

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same {@code hashCode()} is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are {@link Comparable}, this class may use comparison order among keys to help break ties.

如果要在hashmap实例中存储许多映射，那么创建具有足够大容量的映射将比让它根据需要执行自动重新灰化以扩展表更有效地存储映射。请注意，使用多个具有相同{@code hashcode（）}的键肯定会降低任何哈希表的性能。为了改善影响，当键是{@link comparable}时，这个类可以使用键之间的比较顺序来帮助打破联系。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map.

请注意，此实现不同步。如果多个线程同时访问哈希映射，并且至少有一个线程在结构上修改了该映射，则必须在外部对其进行同步。（结构修改是添加或删除一个或多个映射的任何操作；仅更改与实例已包含的键相关联的值不是结构修改。）这通常是通过在自然封装了地图。

If no such object exists, the map should be "wrapped" using the {@link Collections#synchronizedMap Collections.synchronizedMap} method. This is best done at creation time, to prevent accidental unsynchronized access to the map:<pre> Map m = Collections.synchronizedMap(new HashMap(...));</pre>

如果不存在此类对象，则应使用{@link collections synchronizedmap collections.synchronizedmap}方法“包装”映射。最好在创建时执行此操作，以防止对映射的意外非同步访问：<pre>map m=collections.synchronizedmap（new hashmap（…）；</pre>

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a {@link ConcurrentModificationException}. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

这个类的所有“集合视图方法”返回的迭代器都是快速失败的：如果在迭代器创建后的任何时候对映射进行了结构修改（除了通过迭代器自己的remove方法之外），迭代器将抛出一个concurrentModificationException。因此，在面对并发修改时，迭代器会快速而干净地失败，而不是在将来某个不确定的时间冒着任意的、不确定的行为的风险

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs. This class is a member of the <a href="{@docRoot}/../technotes/guides/collections/index.html"> Java Collections Framework</a>.

注意，不能保证迭代器的Fail-Fast行为，因为通常情况下，在存在不同步的并发修改的情况下，不可能做出任何硬保证。Fail-Fast迭代器在尽最大努力的基础上抛出ConcurrentModificationException。因此，编写依赖于此异常的正确性的程序是错误的：迭代器的快速失败行为应该只用于检测错误。这个类是java集合框架的成员。







This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. Most methods try to use normal bins, but relay to TreeNode methods when applicable (simply by checking instanceof a node). Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. However, since the vast majority of bins in normal use are not overpopulated, checking for existence of tree bins may be delayed in the course of table methods.

这个映射通常充当一个binned（bucketed）哈希表，但是当bin变得太大时，它们会被转换成treenodes的bin，每个bin的结构与java.util.treemap中的相似。大多数方法尝试使用普通的bin，但在适用时会中继到treenode方法（只需检查节点的instanceof）。treenodes的存储箱可以像其他任何存储箱一样被遍历和使用，但是在人口过多时还支持更快的查找。然而，由于绝大多数正常使用的垃圾箱并不是人口过剩的，在使用表格方法的过程中，检查是否存在树垃圾箱可能会被延迟。

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable<C>",type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

树箱（即，其元素均为treenodes的箱）主要按hashcode排序，但对于ties，如果两个元素属于同一“类c implements comparable<c>”，则键入它们的compareto方法进行排序。（我们保守地通过反射检查泛型类型来验证这一点——请参阅方法ComparableClassfor）。当密钥具有不同的散列或可排序时，树存储箱的额外复杂性值得提供最坏情况的o（log n）操作，因此，在意外或恶意使用hashcode（）方法返回分布不均匀的值时，性能会下降，以及许多密钥共享哈希码的密钥，只要它们也是可比的。（如果两者都不适用，我们可能在时间和空间上浪费大约两倍于不采取预防措施。但是，唯一已知的案例来自于糟糕的用户编程实践，这些实践已经非常缓慢，几乎没有什么区别。）

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) pow(0.5, k) / factorial(k)). The first values are:0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 more: less than 1 in ten million

由于treenodes的大小大约是常规节点的两倍，因此我们仅在容器包含足够的节点以保证使用时才使用它们（请参见treeify_threshold）。当它们变得太小（由于移除或调整大小）时，它们会被转换回普通的垃圾箱。在使用分布良好的用户哈希码时，很少使用树箱。理想情况下，在随机散列码下，bin中节点的频率遵循poisson分布（http://en.wikipedia.org/wiki/poisson_分布），默认大小调整阈值为0.75时，平均参数约为0.5，但由于大小调整粒度，变化较大。忽略方差，列表大小k的预期出现次数是（exp（-0.5）pow（0.5，k）/factorial（k））。第一个值是：0:0.60653066 1:0.30326533 2:0.07581633:0.01263606 4:0.00157952 5:0.00015795 6:0.00001316 7:0.00000094 8:0.00000006大于：小于1/1000万

The root of a tree bin is normally its first node. However, sometimes (currently only upon Iterator.remove), the root might be elsewhere, but can be recovered following parent links (method TreeNode.root()).

树仓的根通常是它的第一个节点。但是，有时（当前仅在iterator.remove时），根可能在其他地方，但可以在父链接之后恢复（方法treenode.root（））。

All applicable internal methods accept a hash code as an argument (as normally supplied from a public method), allowing them to call each other without recomputing user hashCodes. Most internal methods also accept a "tab" argument, that is normally the current table, but may be a new or old one when resizing or converting.

所有适用的内部方法都接受哈希代码作为参数（通常由公共方法提供），允许它们在不重新计算用户哈希代码的情况下相互调用。大多数内部方法也接受“tab”参数，通常是当前表，但在调整大小或转换时可能是新的或旧的。

When bin lists are treeified, split, or untreeified, we keep them in the same relative access/traversal order (i.e., field Node.next) to better preserve locality, and to slightly simplify handling of splits and traversals that invoke iterator.remove. When using comparators on insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers.

当bin列表被treeified、split或untreeefied时，我们将它们保持在相同的相对访问/遍历顺序（即field node.next）中，以更好地保持局部性，并稍微简化调用iterator.remove的拆分和遍历的处理。当在插入时使用比较器时，为了在重新平衡中保持总的顺序（或者尽可能接近这里所要求的顺序），我们将类和标识hashcode作为连接断路器进行比较。

The use and transitions among plain vs tree modes is complicated by the existence of subclass LinkedHashMap. See below for hook methods defined to be invoked upon insertion,removal and access that allow LinkedHashMap internals to otherwise remain independent of these mechanics. (This also requires that a map instance be passed to some utility methods that may create new nodes.)

由于LinkedHashMap子类的存在，普通模式和树模式之间的使用和转换非常复杂。请参阅下面的钩子方法，这些钩子方法定义为在插入、移除和访问时调用，从而允许LinkedHashMap内部保持独立于这些机制。（这还要求将映射实例传递给一些可能创建新节点的实用程序方法。）

The concurrent-programming-like SSA-based coding style helps avoid aliasing errors amid all of the twisty pointer operations.

基于ssa的并发编程风格有助于避免在所有扭曲指针操作中出现混叠错误。