本文对HashMap部分源码按照出现顺序进行翻译解读,绿色英文为HashMap源码中的注释,中文为注释对应翻译,代码段中为相应源码 *仅供参考
目录
一、类级注释
Implementation notes.
实现注意事项。
This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. Most methods try to use normal bins, but relay to TreeNode methods when applicable (simply by checking instanceof a node). Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. However, since the vast majority of bins in normal use are not overpopulated, checking for existence of tree bins may be delayed in the course of table methods.
这个映射通常充当一个容器化 (桶化)的哈希表,但是当容器变得太大时,它们会被转换为树状节点的容器,每个容器的结构都类似于java.util.TreeMap中的bins容器。大多数方法尝试使用普通的容器,但在适用的情况下会使用TreeNode树状方法(只需检查节点的实例)。树状的容器可以像其他容器一样被遍历和使用,当填充过多时还可以支持更快地查找。但是,由于在正常使用中绝大多数的容器都没有被填充过多,所以在使用表方法的过程中,树状容器的存在性检查可能会延迟。
Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable<C>", type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this -- see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)
树容器(即,其元素都是TreeNodes树节点的容器)主要由哈希码排序,但在如下情况,比如两个元素是相同的“class C implements Comparable”,则键入它们的compareTo方法用于排序。(我们通过反射保守地检查泛型类型来验证这一点——参见方法comparableClassFor)。在最坏情况O(log n)的操作中,当键有不同的哈希值或可排序时,增加树箱的复杂性是值得的。因此,性能降低优雅地在意外或恶意使用hashCode()方法返回值的差分布,以及许多钥匙分享hashCode、只要它们也具有可比性。(如果这两者都不适用,与不采取预防措施相比,我们可能会浪费大约两倍的时间和空间。但唯一已知的情况是由于糟糕的用户编程实践,它们已经非常慢了,所以影响不大。)
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
0.60653066 | 0.30326533 | 0.07581633 | 0.01263606 | 0.00157952 | 0.00015795 | 0.00001316 | 0.00000094 | 0.00000006 |
more: less than 1 in ten million
因为TreeNodes的大小大约是常规节点的两倍,所以我们只在容器中包含足够的节点以保证使用时才使用它们(请参阅TREEIFY_THRESHOLD)。当它们变得太小(由于移除或调整大小)时,它们会被转换回普通容器。在使用分布良好的用户hashCodes时,很少使用树状容器。理想情况下,在随机hashCodes下,容器中的节点频率遵循泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution),默认调整阈值0.75的参数平均约为0.5,尽管由于调整粒度而有很大的差异。忽略方差,列表大小k的期望出现次数是(exp(-0.5) * pow(0.5, k) / factorial(k))。第一个值是:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
0.60653066 | 0.30326533 | 0.07581633 | 0.01263606 | 0.00157952 | 0.00015795 | 0.00001316 | 0.00000094 | 0.00000006 |
更多:少于千万分之一
The root of a tree bin is normally its first node. However, sometimes (currently only upon Iterator.remove), the root might * be elsewhere, but can be recovered following parent links * (method TreeNode.root()).
树容器的根通常是它的第一个节点。但是,有时(目前仅在Iterator.remove上),根可能在其他地方,但可以在父链接之后恢复(方法TreeNode.root())。
All applicable internal methods accept a hash code as an argument (as normally supplied from a public method), allowing them to call each other without recomputing user hashCodes. Most internal methods also accept a "tab" argument, that is * normally the current table, but may be a new or old one when resizing or converting.
所有适用的内部方法都接受哈希码作为参数(通常由公共方法提供),允许它们相互调用而无需重新计算用户hashCodes。大多数内部方法也接受“tab”参数,通常是当前表,但在调整大小或转换时可能是新的或旧的。
When bin lists are treeified, split, or untreeified, we keep them in the same relative access/traversal order (i.e., field Node.next) to better preserve locality, and to slightly simplify handling of splits and traversals that invoke iterator.remove. When using comparators on insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers.
当容器列表被树形化、拆分或非树形化时,我们将它们保持在相同的相对访问/遍历顺序(例如,字段Node.next),以更好地保留局部性,并略微简化调用iterator.remove的拆分和遍历的处理。当在插入时使用比较器时,为了在重新平衡时保持总的排序(或尽可能接近需要的排序),我们比较类和identityHashCodes作为决定因素。
The use and transitions among plain vs tree modes is complicated by the existence of subclass LinkedHashMap. See below for hook methods defined to be invoked upon insertion, removal and access that allow LinkedHashMap internals to otherwise remain independent of these mechanics. (This also requires that a map instance be passed to some utility methods that may create new nodes.) The concurrent-programming-like SSA-based coding style helps avoid aliasing errors amid all of the twisty pointer operations.
由于子类LinkedHashMap的存在,普通模式和树模式之间的使用和转换变得复杂。请参阅下面的钩子方法,钩子方法定义在插入、移除和访问时调用,允许LinkedHashMap内部保持独立于这些机制。(这还需要将映射实例传递给一些可能创建新节点的实用程序方法。)类似于基于ssa的并行编程的编码风格有助于避免所有扭曲指针操作中的混叠错误。
二、变量源码
The default initial capacity - MUST be a power of two.
默认初始容量,必须是2的幂。
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
The maximum capacity, used if a higher value is implicitly specified by either of the constructors with arguments.
MUST be a power of two <= 1<<30.
最大容量,如果两个带参数的构造函数隐含指定了更高的值时使用。
2的幂必须小于等于位运算1<<30。
static final int MAXIMUM_CAPACITY = 1 << 30;
The load factor used when none specified in constructor.
当构造函数中没有指定时使用的加载因子。
static final float DEFAULT_LOAD_FACTOR = 0.75f;
The bin count threshold for using a tree rather than list for a bin. Bins are converted to trees when adding an element to a bin with at least this many nodes. The value must be greater than 2 and should be at least 8 to mesh with assumptions in tree removal about conversion back to plain bins upon shrinkage.
使用树而不是列表的容器计数阈值。当向至少有这么多节点的容器中添加元素时,容器会转换为树。该值必须大于2,并且应该至少为8,以便与树删除中关于在收缩时转换回普通容器的假设相啮合。
static final int TREEIFY_THRESHOLD = 8;
The bin count threshold for untreeifying a (split) bin during a resize operation. Should be less than TREEIFY_THRESHOLD, and at most 6 to mesh with shrinkage detection under removal.
在调整大小操作期间取消(分割)容器树化的容器计数阈值。
应小于TREEIFY_THRESHOLD,且在不超过6的情况下进行收缩检测。
static final int UNTREEIFY_THRESHOLD = 6;
The smallest table capacity for which bins may be treeified. (Otherwise the table is resized if too many nodes in a bin.) Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts between resizing and treeification thresholds.
容器可以树状化的最小表容量。 (否则,如果一个容器中节点太多,表的大小就会被调整。) 应该至少是4 * TREEIFY_THRESHOLD,以避免调整大小和树化阈值之间的冲突。
static final int MIN_TREEIFY_CAPACITY = 64;