hashmap源码研读——put、get、resize方法

最新推荐文章于 2023-04-24 13:42:09 发布

蓝桉未与

最新推荐文章于 2023-04-24 13:42:09 发布

阅读量144

点赞数

分类专栏： # java 文章标签： java

lanannotwith

本文链接：https://blog.csdn.net/qq_44833552/article/details/124006289

版权

java 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

hashmap源码研读

jdk1.8 hashmap
面试题

jdk1.8 hashmap

类注释

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.

> map基于hash table实现的、hashmap允许null key和null value
> hashmap等价于hashtable，只是hashmap不是同步和允许有null值
> hashmap不保证有序

 *
 * <p>This implementation provides constant-time performance for the basic
 * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 *

> 提供基本操作get函数与put函数
> hash函数用于设置数据散列的位置，将元素散列到buckets中
> buckets：hashmap底层是数组加链表，数组存放元素的位置
> 在集合视图上迭代需要的时间与HashMap实例的“容量”(bucket的数量)加上其大小(键-值映射的数量)成比例。
> 因此不能把初始容量设置得太高或者装载因子设的太低

 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 * 

> 一个hashmap实例有两个参数影响它的表现： 初始容量，加载因子
> 初始容量(initial capacity)：哈希表中buckets的数量就是容量，初始容量是一个hashmap 实例被创建时指定的容量，HashMap<String, Object> map = new HashMap<>(16);
> 加载因子(load factor)：一种用来控制当哈希表中所含元素达到多满时才进行扩容的措施

 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 *
 
> 默认的负载因子(0.75)提供了一个很好的时间和空间成本之间的权衡(权衡) <文末具体介绍>
> 负载因子值设的太高会增加查找的成本
> 在设置其初始容量时，应考虑Map中的预期KV数及其装载因子，以最大限度地减少重新计算散列操作
> 如果初始容量大于最大KV数除以加载因子，则不会发生任何重新计算散列操作。

 * <p>If many mappings are to be stored in a <tt>HashMap</tt>
 * instance, creating it with a sufficiently large capacity will allow
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 *

> hashmap不是线程安全的，多线程操作同一个map时，方法加上synchronized

 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>


> Map m = Collections.synchronizedMap(new HashMap(...));
> 让你创建的new HashMap()支持多线程数据的同步。保证多线程访问数据的一致性

 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.

> 除非用iterator的remove方法，不然如果在迭代过程中map有结构上的修改就会报错ConcurrentModificationException

 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 */

正文

// jdk1.8 hashmap底层采用数组+链表+红黑树
public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

    private static final long serialVersionUID = 362498820763181265L;

    /*
     * 此处注释截取一段
     * Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     > 0.75作为加载因子，每个碰撞位置的链表长度超过８个是几乎不可能的
     */

    /**
     * The default initial capacity - MUST be a power of two.
     * 默认初始化map的容量
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     * 最大容量，如果两个构造函数中的任何一个使用参数隐式指定了更高的值，则使用该值（2^30）
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * 默认的填充因子：0.75，能较好的平衡时间与空间的消耗
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 将链表(桶)转化成红黑树的阈值
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * 在哈希表扩容时，如果发现链表长度小于 6，则会由树重新退化为链表
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 只有键值对数量大于 64 才会发生转换。
     * 这是为了避免在哈希表建立初期，多个键值对恰好被放入了同一个链表中而导致不必要的转化。
     */
    static final int MIN_TREEIFY_CAPACITY = 64;
}

put方法 putVal()执行流程

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict){……}

在这里插入图片描述

get方法 getNode()执行流程

final Node<K,V> getNode(int hash, Object key){……}

getNode

扩容方法 resize()执行流程

resize（）

一、为什么单链表复制到新节点数组时，hash值与上旧节点数组容量等于0可以判断值在新节点数组中下标是否变化？变化了多少？
源码中根据(e.hash & oldCap) == 0)判断e在新节点数组中下标是否变化，这里假设几个常量：
旧节点数组容量oldCap：2^4=16；e节点key hash值（取后8位，前24位为0）：11010011
通过hash&(oldCap-1)计算可以得：e在旧节点数组中下标为3
计算过程：    11010011(hash值)
                      &      1111(oldCap-1二进制值)
                      ——————————————
                               0011=3(十进制）
通过hash&(oldCap)计算可以得：e在新节点数组下标变换了
计算过程：    11010011(hash值)
                      &   10000(oldCap二进制值)
                      ——————————————
                            10000=1(十进制）
通过hash&(oldCap<<1-1)计算可以得：e在旧节点数组中下标为19
计算过程：newCap = oldCap<<1 = 16 * 2 - 1= 31（11111）
                      11010011(hash值)
                      &    11111(oldCap二进制值)
                      ——————————————
                              10011=19(十进制）
再假设其它条件不变，e的hash值为11000011，计算同上。可得hash&(oldCap) = 0；hash&(oldCap<<1-1) = 3。
通过上面例子，可以知道(e.hash & oldCap) == 0)可以判断e在新节点数组中下标是否变化，且变化的幅度为（旧数组下标 + 旧数组容量）

面试题

HashMap加载因子为什么是0.75？

加载因子(负载因子、装载因子)衡量hash表的空间使用程度，加载因子越大hash表装载越多
hashmap使用链表法，查找时间复杂度平均为O(1+n)，因此如果负载因子越大，对空间的利用更充分，然而后果是查找效率的降低；如果负载因子太小，那么散列表的数据将过于稀疏，对空间造成严重浪费
HashMap在时间和空间两者间折中选择了0.75

为什么链表长度达到8就上升到红黑树

有源码注释可以了解，hashmap加载因子是0.75，每个桶存入数据的概率是0.5
根据泊松分布公式：(exp(-0.5)*power(0.5，k)/factorial(K))
Math.exp(-0.5) * Math.pow(0.5, k) / IntMath.factorial(k)
散列性好的hash算法：桶中节点越多，出现下个节点概率越低

1个桶中出现1个节点的概率:0.3032653299
1个桶中出现2个节点的概率:0.0758163325
1个桶中出现3个节点的概率:0.0126360554
1个桶中出现4个节点的概率:0.0015795069
1个桶中出现5个节点的概率:0.0001579507
1个桶中出现6个节点的概率:0.0000131626
1个桶中出现7个节点的概率:0.0000009402
1个桶中出现8个节点的概率:0.0000000588 // 亿分之六
1个桶中出现9个节点的概率:0.0000000033

从数据可以看出桶中出现8个节点概率为亿分之六，不到千万分之一，也就是说用0.75作为加载因子，每个碰撞位置的链表长度超过８个是几乎不可能的，在这个情况下不会基本出现树华操作

散列性不好的hash算法：链表长度超过 8 就转为红黑树的设计，更多的是为了防止用户自己实现了不好的哈希算法时导致链表过长，从而导致查询效率低

有人可能会说，还是没讲清楚为什么用8
链表时间复杂度：O(N) 红黑树时间复杂度：O(log(N))
你假想一下，你认为写个好的散列函数，根据泊松分布可以得知存放第8位数据概率就非常小了，然后好的散列函数让数据分布均匀加上存入hashmap得数据不多，你是不是就不会想去链表转红黑树提高查询效率，毕竟时间复杂度在8之前，两者没多大差距

好的hash函数，单个桶存放数据大概率不超过8位
坏的hash函数，单个桶存放数据超过8位，就树化（此时转为红黑树更多的是一种保底策略，用来保证极端情况下查询的效率）

为什么链表长度达到6就红黑树变链表

过渡，避免红黑树和链表进行频繁切换

modCount作用？为什么设计它？

 /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

通过翻译注释可以了解到：

记录修改结构的次数
结构修改是指那些更改或修改其内部结构（例如，HashMap的put新值、remove、重新哈希……；ArrayList的add、remove……）
用于迭代器的快速失败策略(fail-fast)

作用：防止在迭代过程中某些原因改变了原集合，导致出现不可预料的情况，从而抛出并发修改异常。每次迭代器中next获取下个元素前会比较迭代器实例中expectedModCount与集合中modCount比较是否相等，不等就会抛出ConcurrentModificationException。

public static void main(String[] args) {
        HashMap<String, Integer> map=new HashMap<>(8);
        map.put("1", 1);
        map.put("2", 2);
        map.put("3", 3);

        // iterator expectedModCount 3
        // iterator1 expectedModCount 4
        // iterator2 expectedModCount 5
        for(Iterator<Map.Entry<String, Integer>> iterator = map.entrySet().iterator(); iterator.hasNext();){
            Map.Entry<String, Integer> next = iterator.next();
            if ("1".equals(next.getKey())) {
                map.remove("1");
                Iterator<Map.Entry<String, Integer>> iterator1 = map.entrySet().iterator();
                map.put("4", 4);
                Iterator<Map.Entry<String, Integer>> iterator2 = map.entrySet().iterator();
            }
        }
    }

这里会抛出ConcurrentModificationException异常，因为iterator中expectedModCount是3，而map中modCount是5，所有不相等

蓝桉未与

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
hashmap源码研读——put、get、resize方法

hashmap源码研读jdk1.8 hashmap类注释jdk1.8 hashmap类注释/** * Hash table based implementation of the <tt>Map</tt> interface. This * implementation provides all of the optional map operations, and permits * <tt>null</tt> values and the &
复制链接

扫一扫