hashmap源码翻译

angry_pawn

于 2021-07-02 11:07:51 发布

阅读量96

点赞数

分类专栏：日常问题文章标签： java

本文链接：https://blog.csdn.net/zhaolinyuan24/article/details/118408034

版权

日常问题专栏收录该内容

6 篇文章 0 订阅

订阅专栏

源码

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.
 *
 * <p>This implementation provides constant-time performance for the basic
 * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 *
 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 *
 * <p>If many mappings are to be stored in a <tt>HashMap</tt>
 * instance, creating it with a sufficiently large capacity will allow
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 *
 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
 *
 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 *
 * <p>This class is a member of the
 * <a href="{@docRoot}/../technotes/guides/collections/index.html">
 * Java Collections Framework</a>.
 *
 * @param <K> the type of keys maintained by this map
 * @param <V> the type of mapped values
 *
 * @author  Doug Lea
 * @author  Josh Bloch
 * @author  Arthur van Hoff
 * @author  Neal Gafter
 * @see     Object#hashCode()
 * @see     Collection
 * @see     Map
 * @see     TreeMap
 * @see     Hashtable
 * @since   1.2
 */

翻译

/**
 * 基于哈希表的Map接口实现。这个实现提供了所有可选的映射操作，并允许null值和null键(HashMap类大致相当于Hashtable，
 * 只是它线程不安全并且允许空值。）这个类是无序的；特别是，它不能保证随着时间的推移，顺序保持不变。
 *
 * 这种实现为基本操作（get和put）提供了恒定的时间成本，前提是哈希函数将元素正确地分散在存储桶中。
 * 对集合视图的迭代需要与HashMap实例的“容量”（bucket的数量）加上其大小（键值对的数量）成比例的时间。
 * 因此，如果迭代性能很重要，那么不要将初始容量设置得太高（或者负载系数太低）。
 *
 * 一个HashMap的实例有两个参数影响其性能：初始容量和负载因子。
 * 容量是哈希表中的桶数，初始容量只是创建哈希表时的容量。负载因子是在自动增加哈希表容量之前允许获取的满值的度量。
 * 当哈希表中的条目数超过负载因子和当前容量的乘积时，哈希表将被扩容（即重建内部数据结构），以便哈希表的桶数大约为两倍。
 *
 * 一般来说，默认负载因子（0.75）在时间和空间成本之间提供了一个很好的折中。较高的值会减少空间开销，但会增加查找成本（反映
 * 在HashMap类的大多数操作中，包括get和put）。在设置初始容量时，应考虑map中的预期条目数及其负载因子，以尽量减少扩容操作的次数。
 * 如果初始容量大于最大条目数除以负载因子，则不会发生扩容操作。
 *
 * 如果要在HashMap中存储大量键值对，例如，使用足够大的容量创建它将允许更有效地存储键值对，而不是让它根据需要执行扩容以增加表。
 * 请注意，使用具有相同hashCode的多个键肯定会降低任何哈希表的性能。为了改善影响，当键是{@link Comparable}时，
 * 这个类可以使用键之间的比较顺序来帮助打破联系。
 *
 * <strong>请注意，此实现是线程不安全的。</strong>如果多个线程同时访问hashmap，并且至少有一个线程在结构上修改该map，
 * 则必须在外部对其加锁(结构修改是添加或删除一个或多个键值对的任何操作；仅仅更改实例已包含的键对应的值并不是结构修改。）
 * 这通常是通过在自然封装map的某个对象上进行加锁来实现的。
 *
 * 如果不存在这样的对象，则应该使用{@link Collections#synchronizedMap Collections.synchronizedMap}方法“包装”map。
 * 这最好在创建时完成，以防止对map的非线程安全访问：<pre>Map m = Collections.synchronizedMap(new HashMap(...));</pre>
 *
 * 这个类的所有“集合视图方法”返回的迭代器都是快速失败（fail-fast）：如果在迭代器创建之后的任何时候，
 * 以任何方式（除了通过迭代器自己的remove方法之外）对map进行结构修改，迭代器将抛出一个{@link ConcurrentModificationException}。
 * 因此，在面对并发修改时，迭代器会快速而干净地失败，而不是冒着在将来不确定的时间出现任意的、不确定的行为的风险。
 *
 * 注意，不能保证迭代器的快速失败（fail-fast）行为，因为一般来说，在存在线程不安全并发修改的情况下，不可能做出任何硬保证。
 * 快速失败（fail-fast）迭代器以最大努力的方式抛出 {@link ConcurrentModificationException}。因此，编写一个依赖于此异常的程序
 * 是错误的：迭代器的快速失败（fail-fast）行为应该只用于检测错误。
 *
 */

源码

/*
     * Implementation notes.
     *
     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode, but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable, Thus, performance degrades gracefully under
     * accidental or malicious usages in which hashCode() methods
     * return values that are poorly distributed, as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions. But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     *
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     * The root of a tree bin is normally its first node.  However,
     * sometimes (currently only upon Iterator.remove), the root might
     * be elsewhere, but can be recovered following parent links
     * (method TreeNode.root()).
     *
     * All applicable internal methods accept a hash code as an
     * argument (as normally supplied from a public method), allowing
     * them to call each other without recomputing user hashCodes.
     * Most internal methods also accept a "tab" argument, that is
     * normally the current table, but may be a new or old one when
     * resizing or converting.
     *
     * When bin lists are treeified, split, or untreeified, we keep
     * them in the same relative access/traversal order (i.e., field
     * Node.next) to better preserve locality, and to slightly
     * simplify handling of splits and traversals that invoke
     * iterator.remove. When using comparators on insertion, to keep a
     * total ordering (or as close as is required here) across
     * rebalancings, we compare classes and identityHashCodes as
     * tie-breakers.
     *
     * The use and transitions among plain vs tree modes is
     * complicated by the existence of subclass LinkedHashMap. See
     * below for hook methods defined to be invoked upon insertion,
     * removal and access that allow LinkedHashMap internals to
     * otherwise remain independent of these mechanics. (This also
     * requires that a map instance be passed to some utility methods
     * that may create new nodes.)
     *
     * The concurrent-programming-like SSA-based coding style helps
     * avoid aliasing errors amid all of the twisty pointer operations.
     */

 翻译

    /*
     * 注意事项
     *
     * 这个map通常充当一个装箱的哈希表，但是当装箱太大时，它们被转换成树节点的容器，每个容器的结构与java.util.TreeMap中的类似。
     * 大多数方法尝试使用普通的容器，但在适用的情况下会继承TreeNode方法（只需检查节点的instanceof）。TreeNodes的容器可以像
     * 其他容器一样被遍历和使用，但是在过多的情况下支持更快的查找。然而，由于大多数在正常使用中的容器并不是过多的，在表方法的过程中，
     * 检查是否存在树容器可能会被延迟。
     *
     * 树容器（即其元素均为树节点的容器）是主要按hashCode排序，但在特殊的情况下，如果是两个元素具有相同的类型（如：
     * “class C implements Comparable<C>”），然后使用它们的compareTo方法进行排序(我们通过反射保守地检查泛型类型来验证这一点——
     * 请参见方法compariableclassfor）。在提供最坏情况下的O(log n)操作时，当键具有不同的哈希或有序时，树容器的额外复杂性是值得的，
     * 因此，在hashCode()方法返回分布不均匀的值以及许多键共享一个hashCode的值的意外或恶意使用下，性能会优雅地下降，只要它们是可比
     * 的(如果两者都不适用，我们可能会浪费大约两个因素的时间和空间相比，采取不注意事项。但已知的唯一案例来自于糟糕的用户编程实践
     * 已经太慢了，以至于没有什么不同。）
     *
     * 因为treeNode的大小大约是常规节点的两倍，所以我们只在容器包含足够的节点以保证使用时才使用它们（请参见TREEIFY_THRESHOLD）。
     * 当它们变得太小时（由于移除或调整大小），它们会被转换回普通的容器。在使用分布良好的用户hashcode时，很少使用树容器。
     * 理想情况下，在随机哈希码下，箱中节点的频率服从泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution)
     * 带着默认扩容的参数平均约为0.5阈值为0.75，但由于扩容。忽略方差，列表大小k的预期出现次数为(exp(-0.5) * pow(0.5, k) /
     * factorial(k))。第一个值是：
     *
     *       0:    0.60653066
     *       1:    0.30326533
     *       2:    0.07581633
     *       3:    0.01263606
     *       4:    0.00157952
     *       5:    0.00015795
     *       6:    0.00001316
     *       7:    0.00000094
     *       8:    0.00000006
     *
     * 更多：不到千万分之一
     *
     * 树容器的根通常是它的第一个节点。然而，有时（当前仅在Iterator.remove上），根可能在其他地方，但可以通过
     * 父链接（方法TreeNode.root()）恢复。
     *
     * 所有适用的内部方法都接受hashcode作为参数（通常由公共方法提供），允许它们在不重新计算用户hashcode的情况下互相调用。
     * 大多数内部方法也接受“tab”参数，通常是当前表，但在扩容或转换时可能是新表或旧表。
     *
     * 当容器列表被树化、拆分或未经树化时，我们会保留它们具有相同的相对访问/遍历顺序（即字段Node.next）以更好地保留局部性，
     * 并稍微简化对调用iterator.remove的拆分和遍历的处理。当在插入时使用比较器时，为了在重新平衡过程中保持总的顺序（或尽可能
     * 接近这里所要求的顺序），我们将类和标识hashcode作为连接断路器进行比较。
     *
     * 简单模式和树模式之间的使用和转换是由于子类LinkedHashMap的存在而变得复杂。请参阅下面的钩子方法，这些方法定义为在插入、移除
     * 和访问时调用，从而允许LinkedHashMap内部保持独立于这些机制(这还需要将map实例传递给一些可能创建新节点的实用工具方法。）
     *
     * 这种基于SSA的编码风格的并发编程有助于避免所有扭曲指针操作中的混叠错误。
     */

自己翻译的，水平有限，有的地方语句不通顺是因为自己也不理解，不知道怎么翻译，有的地方可能有误，仅供参考。

angry_pawn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hashmap源码翻译

源码/** * Hash table based implementation of the <tt>Map</tt> interface. This * implementation provides all of the optional map operations, and permits * <tt>null</tt> values and the <tt>null</tt> key. (The <tt&gt
复制链接

扫一扫