HashMap1.8源码（1）_hashmap分箱-CSDN博客

本文链接：https://blog.csdn.net/weixin_44049210/article/details/128022170

* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.

基于哈希表的Map接口的实现。这个实现提供了所有可选的映射操作，并允许null值和null键。（HashMap类大致相当于HashTable，除了HashMap类是unsynchronized和允许null的。）这个类不保证映射的顺序；特别是，它不保证顺序将随着时间的推移保持不变。

* This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.

这个实现假定哈希函数将元素适当地分散在各桶中，为基本操作（get 和 put）提供了稳定的性能。迭代集合所需的时间与HashMap实例的“容量”（桶的数量）加上集合的大小（键-值映射的数量）成比例。因此，如果迭代性能重要，不要设置太高的初始容量（或加载因子设置太低）是非常重要的。

* An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: initial capacity and load factor. The
* capacity is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* load factor is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is rehashed (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.

HashMap的实例有两个参数影响它的性能：初始容量和加载因子。容量是指哈希表中桶的数量，初始容量就是哈希表在创建时的容量。加载因子是指在哈希表的容量自动增加之前哈希表能存放多满的程度。当哈希表中的键-值映射数量超过加载因子和当前容量的乘积时，哈希表会rehash再次散列（内部数据结构重建），从而哈希表大约有两倍的桶数。

* As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.

一般来说，默认的加载因子0.75在时间和空间成本上提供了很好的权衡。负载因子更大减少了空间的开销但是增加了查询成本。（反应在HashMap类大部分的操作中，包括get和put操作）Map中预期的键值对数量和它的加载因子应该在设置它的初始容量时就要考虑到。以便最小化rehash操作的次数。如果初始容量大于最大键值对数除以加载因子，则永远不会发生rehash操作。

* If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.

如果提前知道要存储很多条目，在创建HahMap实例时就要设置足够大的初始容量，这样可以省去自动扩容带来的性能开销。注意，如果很多键的哈希值相同，会降低哈希表的性能，这些键可以实现Comparable，这样这个类(HashMap)就能在这些键之间使用比较来减少这种影响。

* Note that this implementation is not synchronized.

* If multiple threads access a hash map concurrently, and at least one of

* the threads modifies the map structurally, it must be

* synchronized externally. (A structural modification is any operation

* that adds or deletes one or more mappings; merely changing the value

* associated with a key that an instance already contains is not a

* structural modification.) This is typically accomplished by

* synchronizing on some object that naturally encapsulates the map.

HashMap的实现不是同步的，如果多个线程并发地访问一个哈希表，且至少有一个线程修改哈希表结构，那么必须在该HashMap的外部进行同步(结构改变是指添加或删除键值对的操作，仅仅改变键值对的值不是结构性修改)。通常，可以对包含这个哈希表的对象进行同步来实现线程安全。

* If no such object exists, the map should be "wrapped" using the

* {@link Collections#synchronizedMap Collections.synchronizedMap}

* method. This is best done at creation time, to prevent accidental

* unsynchronized access to the map:<pre>

* Map m = Collections.synchronizedMap(new HashMap(...));</pre>

如果没有这样的对象，应该用 Collections.synchronizedMap来包装HashMap。为了避免意外的非同步访问，最好在创建的时候就对HashMap进行包装，示例如下。

Map m = Collections.synchronizedMap(new HashMap(…));

* The iterators returned by all of this class's "collection view methods"

* are fail-fast: if the map is structurally modified at any time after

* the iterator is created, in any way except through the iterator's own

* <tt>remove</tt> method, the iterator will throw a

* {@link ConcurrentModificationException}. Thus, in the face of concurrent

* modification, the iterator fails quickly and cleanly, rather than risking

* arbitrary, non-deterministic behavior at an undetermined time in the

* future.

这个类的所有集合视图方法的迭代器的返回都遵循fail-fast策略: 如果map在创建完迭代器之后的任何时候结构发生改变，除了通过迭代器自己的remove方法外，迭代器无论如何都会抛出ConcurrentModificationException。因此，面对并发修改，迭代器会快速干净的失败，而不是在将来某个不确定的时间冒着任意的、不确定行为的风险。

* Note that the fail-fast behavior of an iterator cannot be guaranteed

* as it is, generally speaking, impossible to make any hard guarantees in the

* presence of unsynchronized concurrent modification. Fail-fast iterators

* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.

* Therefore, it would be wrong to write a program that depended on this

* exception for its correctness: the fail-fast behavior of iterators

* should be used only to detect bugs.

注意，迭代器本身的fail-fast行为不能被保证，通常来说，在非线程安全的并发修改存在的情况下，不可能做任何硬性的保证。快速失败迭代器尽最大努力抛出ConcurrentModificationException。因此，为了它的正确性编写依赖于此异常的程序的做法是错误的，迭代器的快速失败行为仅用于检测程序Bug。

* This class is a member of the

* <a href="{@docRoot}/../technotes/guides/collections/index.html">

* Java Collections Framework</a>.

HashMap类是Java集合框架中的一员。

* This map usually acts as a binned (bucketed) hash table, but

* when bins get too large, they are transformed into bins of

* TreeNodes, each structured similarly to those in

* java.util.TreeMap. Most methods try to use normal bins, but

* relay to TreeNode methods when applicable (simply by checking

* instanceof a node). Bins of TreeNodes may be traversed and

* used like any others, but additionally support faster lookup

* when overpopulated. However, since the vast majority of bins in

* normal use are not overpopulated, checking for existence of

* tree bins may be delayed in the course of table methods.

这个映射通常充当一个分箱(桶)哈希表，但是当容器太大时，它们被转换为TreeNodes的容器，每个容器的结构类似于java.util.TreeMap中的容器。大多数方法尝试使用普通容器，但在适用的情况下会转向TreeNode方法(通过检查节点的实例)。TreeNodes的容器可以像其他容器一样被遍历和使用，但是当填充过多时还支持更快的查找。然而，由于正常使用的绝大多数容器都没有填充过多，所以在表方法的使用过程中，检查树状容器的存在性可能会延迟。

* Tree bins (i.e., bins whose elements are all TreeNodes) are

* ordered primarily by hashCode, but in the case of ties, if two

* elements are of the same "class C implements Comparable<C>",

* type then their compareTo method is used for ordering. (We

* conservatively check generic types via reflection to validate

* this -- see method comparableClassFor). The added complexity

* of tree bins is worthwhile in providing worst-case O(log n)

* operations when keys either have distinct hashes or are

* orderable, Thus, performance degrades gracefully under

* accidental or malicious usages in which hashCode() methods

* return values that are poorly distributed, as well as those in

* which many keys share a hashCode, so long as they are also

* Comparable. (If neither of these apply, we may waste about a

* factor of two in time and space compared to taking no

* precautions. But the only known cases stem from poor user

* programming practices that are already so slow that this makes

* little difference.)

树状容器(即，它的元素都是TreeNodes)主要由哈希码排序，但在平局情况下，比如两个元素是一样的“class C implements Comparable”，则确定它们的compareTo方法来用于排序。(我们通过反射保守地检查泛型类型来验证这一点——参见方法comparableClassFor)。当键有不同的哈希值或可排序时，在假定最坏情况O(log n)的操作中，增加树状容器的复杂性是值得的。因此，在意外或恶意使用hashCode()方法返回分布较差的值时性能优雅地降低，以及许多键共用同一个hashCode值，只要它们也具有可比性。(如果两者都不应用，不采取预防措施，我们可能会浪费时间和空间上两者之一的因素。但唯一已知的情况是基于糟糕的用户编程实践，它们已经非常慢了以至于影响不大。)

* Because TreeNodes are about twice the size of regular nodes, we

* use them only when bins contain enough nodes to warrant use

* (see TREEIFY_THRESHOLD). And when they become too small (due to

* removal or resizing) they are converted back to plain bins. In

* usages with well-distributed user hashCodes, tree bins are

* rarely used. Ideally, under random hashCodes, the frequency of

* nodes in bins follows a Poisson distribution

* (http://en.wikipedia.org/wiki/Poisson_distribution) with a

* parameter of about 0.5 on average for the default resizing

* threshold of 0.75, although with a large variance because of

* resizing granularity. Ignoring variance, the expected

* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /

* factorial(k)). The first values are:

因为TreeNodes的大小大约是常规节点的两倍，所以我们只在容器中包含足够的节点以保证使用时才使用它们(请参阅TREEIFY_THRESHOLD)。当它们变得太小(由于删除或调整大小)时，它们会转换回普通容器。在使用分布良好的用户hashCodes时，树状容器很少使用。理想情况下，在随机的hashCodes下，容器中节点的频率遵循泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution)，默认扩容阈值为0.75参数平均约为0.5，尽管由于调整粒度而有很大的差异。忽略方差，列表大小k的期望出现次数是(exp(-0.5) * pow(0.5, k) / factorial(k))。第一个值是:

*

* 0:    0.60653066

* 1:    0.30326533

* 2:    0.07581633

* 3:    0.01263606

* 4:    0.00157952

* 5:    0.00015795

* 6:    0.00001316

* 7:    0.00000094

* 8:    0.00000006

* more: less than 1 in ten million

* The root of a tree bin is normally its first node. However,

* sometimes (currently only upon Iterator.remove), the root might

* be elsewhere, but can be recovered following parent links

* (method TreeNode.root()).

树状容器的根通常是它的第一个节点。但是，有时(目前仅在Iterator.remove上)，根可能在其他地方，但可以根据父链接恢复(方法TreeNode.root())。

* All applicable internal methods accept a hash code as an

* argument (as normally supplied from a public method), allowing

* them to call each other without recomputing user hashCodes.

* Most internal methods also accept a "tab" argument, that is

* normally the current table, but may be a new or old one when

* resizing or converting.

所有适用的内部方法都接受哈希码作为参数(通常由公共方法提供)，允许它们相互调用没有重新计算用户hashCodes。大多数内部方法也接受“tab”参数，通常是当前表，但当调整或转换时可能是新的或旧的表。

* When bin lists are treeified, split, or untreeified, we keep

* them in the same relative access/traversal order (i.e., field

* Node.next) to better preserve locality, and to slightly

* simplify handling of splits and traversals that invoke

* iterator.remove. When using comparators on insertion, to keep a

* total ordering (or as close as is required here) across

* rebalancings, we compare classes and identityHashCodes as

* tie-breakers.

当容器列表被树化、拆分或未树化时，我们保持它们在相同的相对访问/遍历顺序（即，字段Node.next) 以更好地维护位置的原状，并稍微简化处理拆分和遍历在调用迭代器.删除。在插入时使用比较器，为了通过重新平衡来保持整体的顺序（或尽可能接近），我们比较类和哈希码作为决定因素。

* The use and transitions among plain vs tree modes is

* complicated by the existence of subclass LinkedHashMap. See

* below for hook methods defined to be invoked upon insertion,

* removal and access that allow LinkedHashMap internals to

* otherwise remain independent of these mechanics. (This also

* requires that a map instance be passed to some utility methods

* that may create new nodes.)

由于LinkedHashMap子类的存在，普通模式和树形模式之间的使用和转换变得复杂。参见下面定义的在插入、删除和访问时调用的钩子方法，允许LinkedHashMap内部保持独立于这些机制。(这也要求映射实例传递给一些可能创建新节点的实用的方法。)