hashmap源码研读——put、get、resize方法

jdk1.8 hashmap

类注释

/**
 * Hash table based implementation of the <tt>Map</tt> interface.  This
 * implementation provides all of the optional map operations, and permits
 * <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>
 * class is roughly equivalent to <tt>Hashtable</tt>, except that it is
 * unsynchronized and permits nulls.)  This class makes no guarantees as to
 * the order of the map; in particular, it does not guarantee that the order
 * will remain constant over time.

> map基于hash table实现的、hashmap允许null key和null value
> hashmap等价于hashtable,只是hashmap不是同步和允许有null值
> hashmap不保证有序

 *
 * <p>This implementation provides constant-time performance for the basic
 * operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
 * disperses the elements properly among the buckets.  Iteration over
 * collection views requires time proportional to the "capacity" of the
 * <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
 * of key-value mappings).  Thus, it's very important not to set the initial
 * capacity too high (or the load factor too low) if iteration performance is
 * important.
 *

> 提供基本操作get函数与put函数
> hash函数用于设置数据散列的位置,将元素散列到buckets中
> buckets:hashmap底层是数组加链表,数组存放元素的位置
> 在集合视图上迭代需要的时间与HashMap实例的“容量”(bucket的数量)加上其大小(键-值映射的数量)成比例。
> 因此不能把初始容量设置得太高或者装载因子设的太低

 * <p>An instance of <tt>HashMap</tt> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>.  The
 * <i>capacity</i> is the number of buckets in the hash table, and the initial
 * capacity is simply the capacity at the time the hash table is created.  The
 * <i>load factor</i> is a measure of how full the hash table is allowed to
 * get before its capacity is automatically increased.  When the number of
 * entries in the hash table exceeds the product of the load factor and the
 * current capacity, the hash table is <i>rehashed</i> (that is, internal data
 * structures are rebuilt) so that the hash table has approximately twice the
 * number of buckets.
 * 

> 一个hashmap实例有两个参数影响它的表现: 初始容量,加载因子
> 初始容量(initial capacity):哈希表中buckets的数量就是容量,初始容量是一个hashmap 实例被创建时指定的容量,HashMap<String, Object> map = new HashMap<>(16);
> 加载因子(load factor):一种用来控制当哈希表中所含元素达到多满时才进行扩容的措施

 *
 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.
 *
 
> 默认的负载因子(0.75)提供了一个很好的时间和空间成本之间的权衡(权衡) <文末具体介绍>
> 负载因子值设的太高会增加查找的成本
> 在设置其初始容量时,应考虑Map中的预期KV数及其装载因子,以最大限度地减少重新计算散列操作
> 如果初始容量大于最大KV数除以加载因子,则不会发生任何重新计算散列操作。

 * <p>If many mappings are to be stored in a <tt>HashMap</tt>
 * instance, creating it with a sufficiently large capacity will allow
 * the mappings to be stored more efficiently than letting it perform
 * automatic rehashing as needed to grow the table.  Note that using
 * many keys with the same {@code hashCode()} is a sure way to slow
 * down performance of any hash table. To ameliorate impact, when keys
 * are {@link Comparable}, this class may use comparison order among
 * keys to help break ties.
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a hash map concurrently, and at least one of
 * the threads modifies the map structurally, it <i>must</i> be
 * synchronized externally.  (A structural modification is any operation
 * that adds or deletes one or more mappings; merely changing the value
 * associated with a key that an instance already contains is not a
 * structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.
 *

> hashmap不是线程安全的,多线程操作同一个map时,方法加上synchronized

 * If no such object exists, the map should be "wrapped" using the
 * {@link Collections#synchronizedMap Collections.synchronizedMap}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the map:<pre>
 *   Map m = Collections.synchronizedMap(new HashMap(...));</pre>


> Map m = Collections.synchronizedMap(new HashMap(...));
> 让你创建的new HashMap()支持多线程数据的同步。保证多线程访问数据的一致性

 * <p>The iterators returned by all of this class's "collection view methods"
 * are <i>fail-fast</i>: if the map is structurally modified at any time after
 * the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the
 * future.

> 除非用iterator的remove方法,不然如果在迭代过程中map有结构上的修改就会报错ConcurrentModificationException

 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 */

正文

// jdk1.8 hashmap底层采用数组+链表+红黑树
public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

    private static final long serialVersionUID = 362498820763181265L;

    /*
     * 此处注释截取一段
     * Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     > 0.75作为加载因子,每个碰撞位置的链表长度超过8个是几乎不可能的
     */

    /**
     * The default initial capacity - MUST be a power of two.
     * 默认初始化map的容量
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     * 最大容量,如果两个构造函数中的任何一个使用参数隐式指定了更高的值,则使用该值(2^30)
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * 默认的填充因子:0.75,能较好的平衡时间与空间的消耗
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     * 将链表(桶)转化成红黑树的阈值
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     * 在哈希表扩容时,如果发现链表长度小于 6,则会由树重新退化为链表
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     * 只有键值对数量大于 64 才会发生转换。
     * 这是为了避免在哈希表建立初期,多个键值对恰好被放入了同一个链表中而导致不必要的转化。
     */
    static final int MIN_TREEIFY_CAPACITY = 64;
}

put方法 putVal()执行流程

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict){……}

在这里插入图片描述

get方法 getNode()执行流程

final Node<K,V> getNode(int hash, Object key){……}

getNode

扩容方法 resize()执行流程

resize()

一、为什么单链表复制到新节点数组时,hash值与上旧节点数组容量等于0可以判断值在新节点数组中下标是否变化?变化了多少?
源码中根据(e.hash & oldCap) == 0)判断e在新节点数组中下标是否变化,这里假设几个常量:
旧节点数组容量oldCap:2^4=16;e节点key hash值(取后8位,前24位为0):11010011
通过hash&(oldCap-1)计算可以得:e在旧节点数组中下标为3
计算过程:    11010011(hash值)
                      &      1111(oldCap-1二进制值)
                      ——————————————
                               0011=3(十进制)
通过hash&(oldCap)计算可以得:e在新节点数组下标变换了
计算过程:    11010011(hash值)
                      &   10000(oldCap二进制值)
                      ——————————————
                            10000=1(十进制)
通过hash&(oldCap<<1-1)计算可以得:e在旧节点数组中下标为19
计算过程:newCap = oldCap<<1 = 16 * 2 - 1= 31(11111)
                      11010011(hash值)
                      &    11111(oldCap二进制值)
                      ——————————————
                              10011=19(十进制)
再假设其它条件不变,e的hash值为11000011,计算同上。可得hash&(oldCap) = 0;hash&(oldCap<<1-1) = 3。
通过上面例子,可以知道(e.hash & oldCap) == 0)可以判断e在新节点数组中下标是否变化,且变化的幅度为(旧数组下标 + 旧数组容量)

面试题

HashMap加载因子为什么是0.75?

  • 加载因子(负载因子、装载因子)衡量hash表的空间使用程度,加载因子越大hash表装载越多
  • hashmap使用链表法,查找时间复杂度平均为O(1+n),因此如果负载因子越大,对空间的利用更充分,然而后果是查找效率的降低;如果负载因子太小,那么散列表的数据将过于稀疏,对空间造成严重浪费
  • HashMap在时间和空间两者间折中选择了0.75

为什么链表长度达到8就上升到红黑树

有源码注释可以了解,hashmap加载因子是0.75,每个桶存入数据的概率是0.5
根据泊松分布公式:(exp(-0.5)*power(0.5,k)/factorial(K))
Math.exp(-0.5) * Math.pow(0.5, k) / IntMath.factorial(k)
散列性好的hash算法:桶中节点越多,出现下个节点概率越低

  • 1个桶中出现1个节点的概率:0.3032653299
  • 1个桶中出现2个节点的概率:0.0758163325
  • 1个桶中出现3个节点的概率:0.0126360554
  • 1个桶中出现4个节点的概率:0.0015795069
  • 1个桶中出现5个节点的概率:0.0001579507
  • 1个桶中出现6个节点的概率:0.0000131626
  • 1个桶中出现7个节点的概率:0.0000009402
  • 1个桶中出现8个节点的概率:0.0000000588 // 亿分之六
  • 1个桶中出现9个节点的概率:0.0000000033

从数据可以看出桶中出现8个节点概率为亿分之六,不到千万分之一,也就是说用0.75作为加载因子,每个碰撞位置的链表长度超过8个是几乎不可能的,在这个情况下不会基本出现树华操作


散列性不好的hash算法:链表长度超过 8 就转为红黑树的设计,更多的是为了防止用户自己实现了不好的哈希算法时导致链表过长,从而导致查询效率低

有人可能会说,还是没讲清楚为什么用8
链表时间复杂度:O(N) 红黑树时间复杂度:O(log(N))
你假想一下,你认为写个好的散列函数,根据泊松分布可以得知存放第8位数据概率就非常小了,然后好的散列函数让数据分布均匀加上存入hashmap得数据不多,你是不是就不会想去链表转红黑树提高查询效率,毕竟时间复杂度在8之前,两者没多大差距

  • 好的hash函数,单个桶存放数据大概率不超过8位
  • 坏的hash函数,单个桶存放数据超过8位,就树化(此时转为红黑树更多的是一种保底策略,用来保证极端情况下查询的效率)

为什么链表长度达到6就红黑树变链表

过渡,避免红黑树和链表进行频繁切换

modCount作用?为什么设计它?

 /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

通过翻译注释可以了解到:

  • 记录修改结构的次数
  • 结构修改是指那些更改或修改其内部结构(例如,HashMap的put新值、remove、重新哈希……;ArrayList的add、remove……)
  • 用于迭代器的快速失败策略(fail-fast)

作用:防止在迭代过程中某些原因改变了原集合,导致出现不可预料的情况,从而抛出并发修改异常。每次迭代器中next获取下个元素前会比较迭代器实例中expectedModCount与集合中modCount比较是否相等,不等就会抛出ConcurrentModificationException。

public static void main(String[] args) {
        HashMap<String, Integer> map=new HashMap<>(8);
        map.put("1", 1);
        map.put("2", 2);
        map.put("3", 3);

        // iterator expectedModCount 3
        // iterator1 expectedModCount 4
        // iterator2 expectedModCount 5
        for(Iterator<Map.Entry<String, Integer>> iterator = map.entrySet().iterator(); iterator.hasNext();){
            Map.Entry<String, Integer> next = iterator.next();
            if ("1".equals(next.getKey())) {
                map.remove("1");
                Iterator<Map.Entry<String, Integer>> iterator1 = map.entrySet().iterator();
                map.put("4", 4);
                Iterator<Map.Entry<String, Integer>> iterator2 = map.entrySet().iterator();
            }
        }
    }

这里会抛出ConcurrentModificationException异常,因为iterator中expectedModCount是3,而map中modCount是5,所有不相等

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

蓝桉未与

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值