HashMap的抽丝剥茧

最新推荐文章于 2021-10-10 15:57:56 发布
思维态度行动
最新推荐文章于 2021-10-10 15:57:56 发布
阅读量258
点赞数 1
分类专栏：源码解析 JavaSE 文章标签： hashmap 链表 java 数据结构
本文链接：https://blog.csdn.net/CodingNO1/article/details/105738674
版权
这篇博客详细注释了HashMap的重要方法，省略了一些非关键部分以保持清晰。内容包括HashMap的链表实现，但未涉及红黑树的中间部分。
摘要由CSDN通过智能技术生成
前言：
对于HashMap的一些重点方法进行了注释。还有大多数没有进行注释的部分就尽量不弄进来了，以免影响观看。
package java.util;

import java.io.IOException;
import java.io.InvalidObjectException;
import java.io.Serializable;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;
import java.util.function.BiConsumer;
import java.util.function.BiFunction;
import java.util.function.Consumer;
import java.util.function.Function;

/**
 * 插入、获取的时间复杂度基本是 O(1)（前提是有适当的哈希函数，让元素分布在均匀的位置）
 * 还有关于红黑树的操作后续再看并总结吧
 * 
 * 再看完构造方法和put方法后可以发现
 * 事实上，new HashMap();完成后，如果没有put操作，是不会分配存储空间的。
 * 
 * 添加操作：
 * 1.当桶数组 table 为空时，通过扩容的方式初始化 table
 * 2.查找要插入的键值对是否已经存在，存在的话根据条件判断是否用新值替换旧值
 * 3.如果不存在，则将键值对链入链表中，并根据链表长度决定是否将链表转为红黑树
 * 4.判断键值对数量是否大于阈值，大于的话则进行扩容操作
 * 
 * 还说下注意点：
 * 1.HashMap有个MIN_TREEIFY_CAPACITY代表：桶中结构转化为红黑树对应的table的最小大小。
 * 当需要将解决 hash 冲突的链表转变为红黑树时，需要判断下此时数组容量，若是由于数组容量太小（小于　MIN_TREEIFY_CAPACITY　）
 * 导致的 hash 冲突太多，则不进行链表转变为红黑树操作，转为利用　resize() 函数对　hashMap 扩容。
 * 所以并不是桶子上有8位元素的时候它就能变成红黑树，它得同时满足我们的散列表容量大于64才行的
 * 
 * 2.请问HashMap在什么时候扩容？
 * 一定是当size达到总容量的0.75时会扩容吗？这个不一定，得看jdk的版本，1.8以上put操作时确实对是否扩容只有loadFactor这个因素
 * 在1.7的源码中的put操作时扩容的条件为“(size >= threshold) && (null != table[bucketIndex])”，也就是说还需要同时满足后面条件，
 * 那么bucketIndex又是什么呢？直译为“桶的下标”，即下一个存放Entry的桶的位置。简而言之，
 * 仅当size >= threshold且发生Hash值%(length-1)冲突（或修改已存在的值或）时，才会进行扩容。
 * 
 * 3.还有关于1.8和1.7的一些改动：
 * 数据结构：
 * JDK1.7使用数组+链表的数据结构，而1.8使用数组+链表+红黑树。
 * 如果插入key的hashcode相同，使用链表方式解决冲突，当链表长度达到8个（默认设置的阈值）时，
 * 调用treeifyBin函数，将链表转换为红黑树。红黑树的时间复杂度为O(log n)，即put/get最坏时间复杂度为O(log n)。而使用链表的话，则是O(n)
 * 数据存储机制：
 * 发生hash冲突时，JDK1.7采用链地址法+头插法，而1.8采用链地址法+尾插法+红黑树。
 * 头插入法插入效率较高，但容易出现逆序且环形链表死循环问题，尾插法可避免此问题。
 * 
 * 4.为什么要用红黑树，而不用平衡二叉树？
 * 插入效率比平衡二叉树高，查询效率比普通二叉树高。所以选择性能相对折中的红黑树
 * 
 * 5. JDK1.7是基于数组+单链表实现（为什么不用双链表）
 * 首先，用链表是为了解决hash冲突。单链表能实现为什么要用双链表呢?(双链表需要更大的存储空间)
 * 
 * 6.再谈下和Hashtable的区别及多线程情况下使用什么：
 * 从存储结构和实现来讲基本上都是相同的。它和HashMap的最大的不同是它是线程安全的，另外它不允许key和value为null。
 * Hashtable是个过时的集合类，不建议在新代码中使用，不需要线程安全的场合可以用HashMap替换，需要线程安全的场合可以用ConcurrentHashMap替换
 * 还有一种就是 Map m = Collections.synchronizedMap(new HashMap(...));
 * 
 * 7.重写对象的Equals方法时，要重写hashCode方法，为什么？跟HashMap有什么关系？
equals与hashcode间的关系:

    如果两个对象相同（即用equals比较返回true），那么它们的hashCode值一定要相同；
    如果两个对象的hashCode相同，它们并不一定相同(即用equals比较返回false)

因为在 HashMap 的链表结构中遍历判断的时候，特定情况下重写的 equals 方法比较对象是否相等的业务逻辑比较复杂，循环下来更是影响查找效率。所以这里把 hashcode 的判断放在前面，只要 hashcode 不相等就玩儿完，不用再去调用复杂的 equals 了。很多程度地提升 HashMap 的使用效率。

所以重写 hashcode 方法是为了让我们能够正常使用 HashMap 等集合类，因为 HashMap 判断对象是否相等既要比较 hashcode 又要使用 equals 比较。而这样的实现是为了提高 HashMap 的效率。
 * 
 * 8. 既然红黑树那么好，为啥hashmap不直接采用红黑树，而是当大于8个的时候才转换红黑树？
 * 因为红黑树需要进行左旋，右旋操作， 而单链表不需要。
 * 以下都是单链表与红黑树结构对比。
 * 如果元素小于8个，查询成本高，新增成本低。
 * 如果元素大于8个，查询成本低，新增成本高。
 * 至于为什么选数字8，是大佬折中衡量的结果-.-，就像loadFactor默认值0.75一样。
 * 
 * 9.其他
 * 扩容后是原先容量的两倍
 * 
 * 底层数组的长度要求2的次方(即使不是2的次方也会经过tableSizeFor转为2的次方):
 * 首先，capacity 为 2的整数次幂的话，计算桶的位置 h&(length-1) 就相当于对 length 取模，提升了计算效率；
 * 其次，capacity 为 2 的整数次幂的话，为偶数，这样 capacity-1 为奇数，奇数的最后一位是 1，
 * 这样便保证了 h&(capacity-1) 的最后一位可能为 0，也可能为 1（这取决于h的值），即与后的结果可能为偶数，也可能为奇数，这样便可以保证散列的均匀性；
 * 而如果 capacity 为奇数的话，很明显 capacity-1 为偶数，它的最后一位是 0，这样 h&(capacity-1) 的最后一位肯定为 0，
 * 即只能为偶数，这样任何 hash 值都只会被散列到数组的偶数下标位置上，这便浪费了近一半的空间。
 * 
 * 怎样通过key获得数组中的索引呢？i=(length - 1) & hash 类似于取余，但是效率高一些
 * 
 * 
 * 
 * 
 * 
 * 
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.（这里说的允许key和value为空）   (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.
 * 上面一段主要讲了允许key和value为null，且说了几乎等同于Hashtable除了不同步和允许为null外
 * 然后还说了此类不保证映射的顺序，特别是它不保证该顺序恒久不变。(个人理解是rehash时会导致顺序变化) 
 *
 * <p>This implementation provides constant-time performance for the basic
 * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 * 此实现假定哈希函数将元素适当地分布在各桶之间，可为基本操作（get 和 put）提供稳定的性能。
 * 迭代 collection 视图所需的时间与 HashMap 实例的“容量”（桶的数量）及其大小（键-值映射关系数）成比例。
 * 所以，如果迭代性能很重要，则不要将初始容量设置得太高（或将加载因子设置得太低）。 
 *
 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 *HashMap 的实例有两个参数影响其性能：初始容量 和加载因子。容量 是哈希表中桶的数量，
 *初始容量只是哈希表在创建时的容量。加载因子 是哈希表在其容量自动增加之前可以达到多满的一种尺度。
 *当哈希表中的条目数超出了加载因子与当前容量的乘积时，
 *则要对该哈希表进行 rehash 操作（即重建内部数据结构），从而哈希表将具有大约两倍的桶数。 
 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 * 默认的装载因子为0.75，过高的装载因子虽然会降低空间消耗，但是会增加查找的时间消耗
 * 在设置初始化参数时，应该考虑好装载因子和实体数目，以便最大限度地减少 rehash 操作次数。
 * 如果初始容量大于最大条目数除以加载因子，则不会发生 rehash 操作。 
 *
 * <p>If many mappings are to be stored in a <tt>HashMap</tt>
 * instance, creating it with a sufficiently large capacity will allow
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 * 如果许多映射要存储在HashMap中，那么创建一个足够大的容量将让映射被更有效地存储，而不是让它执行再hash。
 * 
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 * 这里强调的是不同步
 *
 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>
 *   这里强调的是怎样使其成为一个同步的容器
 *
 *
 * 下面的基本在介绍迭代器和fail-fast
 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only