HashMap

未完待续。。。

HashMap容量

/**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认初始化容量是16。
HashMap的容量一定是2n次幂,这样做的原因是:

HashMap通过hash%n找到bin的位置,当n2的次幂时,有hash%n=hash&(n-1),位运算与%相比,能提高运算效率。
HashMap扩容时,容量变为原来的两倍,元素的所在bin位置取决于它的hashCode值扩展出的一位是0还是1,0的话不变,1的话就是原索引值+oldCap

HashMap底层数据结构

  • JDK1.8
/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;
/**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     */
    static class Node<K,V> implements Map.Entry<K,V> {
    	final int hash;
        final K key;
        V value;
        Node<K,V> next;
		...
    }
/**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        ...
        }

数组+链表+红黑树
HashMapkeyhashCode是继承自Object的,一般来说需要重写。

  1. 通过hashCode的扰动函数(hashCode的高16位与低16位做异或)得到hash值;
  2. 通过hash%n(即hash&(n-1),在n是2的幂时等式成立)得到该元素应该存放的数组位置;
  3. 如果该位置存在元素,就需要判断新加入的元素的hashkey是否跟已存在的相同,相同就覆盖,不同就添加到末尾;
  4. 当链表长度大于阈值(默认为 8)时,将链表转化为红黑树(数组长度需大于64,如不满足则会resize

扰动函数的意义(为何不直接使用hashCode)

jdk注解:

/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don’t benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/

扰动函数是hashCode的高16位与低16位做异或(两次扰动)。
注释中说这样做是:
① 在速度,效用和位扩展质量之间的权衡。
② 混合高位与低位,加强低位的随机性。
③ 如果不混合高位与低位,由于表的限制(初始为16),高位将永远不会参与索引计算。
④ 由于许多常见的hash集的合理分布和我们使用了树来处理bin(bin是表中的一个元素,注释中使用了bin,也可以理解为bucket)中的冲突,因此我们使用一种最廉价的方式(即一次位运算+一次异或)来减少系统性损失(这里我个人理解为hash碰撞)。

loadFactor默认值为什么是0.75?

HashMap源码的开头有一段注释:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).

这里解释了为什么负载因子 loadfactor 采用0.75。它是考虑了时间和空间的折衷。较高的值会减少空间开销,但会增加查找成本。

注释中的泊松分布想说明什么?

HashMap源码的 Implementation notes 中,有一段注释:

Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million

首先是,在良好分布的hashCodes中,树是很少被使用的。理想情况下的随机hashCodes,采用0.75的loadfactor,该bin(应该可以翻译为桶吧)的拥有k个节点的概率服从参数约为0.5的泊松分布,从注解中我们可以看到k=8时的概率非常小,也就是说转化为红黑树的概率很小。如果转化为红黑树,很有可能节点足够多,此时使用红黑树是值得的。

ConcurrentHashMap

参考:

  1. Java 8系列之重新认识HashMap
  2. HashMap defaultLoadFactor = 0.75和泊松分布没有关系
  3. HashMap的loadFactor为什么是0.75?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值