HashMap常量设计目的

最新推荐文章于 2022-06-11 19:40:32 发布

芝麻开花不开门

最新推荐文章于 2022-06-11 19:40:32 发布

阅读量502

点赞数

分类专栏：集合文章标签： HashMap常量设计

本文链接：https://blog.csdn.net/Luyanc/article/details/100795823

版权

集合专栏收录该内容

15 篇文章 0 订阅

订阅专栏

HashMap中有哪些常量？这些常量设计的目的是什么？本篇带你走近Doug Lea、Josh Bloch、Arthur van Hoff、 Neal Gafter对HashMap的设计。（以下都是基于jdk1.8）

常量设计

（1）HashMap默认初始化大小是1 << 4（即16）

    /**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

关于这个变量，注释说“MUST be a power of two”，即必须是2的幂次方。为什么一定要是2的幂次方呢？

HashMap底层数据结构是数组+链表（或数组+红黑树），当添加元素时，索引定位使用的是i =(n - 1) & hash ，当初始化大小n是2的幂次方时，它就等价于 n % hash 。定位下标一般用取余法，而按位与（&）运算的效率要比取余（%）运算的效率高，所以默认初始化大必须为2的幂次方，就是为了使用更高效的与运算。

默认初始化大小为什么是16而不是8或者32？如果太小，扩容比较频繁；如果太大，又占用内存空间。这算是jdk为我们做的初始权衡吧。

（2）HashMap最大容量是1<<30，即2的30次方

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

我们知道int是占4个字节，一个字节是8位，所以说是32位整型，那按理说可以左移31位，即2的31次幂。在这里为什么不是2的31次方呢？实际上，二进制数的最左边那一位是符号位，用来表示正负的。我们来看下面的例子：

   System.out.println(1 << 30);
   System.out.println(1 << 31);
   System.out.println(1 << 32);
   System.out.println(1 << 33);

输出：

1073741824
-2147483648
1
2

所以，HashMap的最大容量就是2的30次方。

（3）HashMap默认加载因子是0.75

    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap表征hash表的填满程度，让我们看一下源码对load factor的解释：

 * <p>As a general rule, the default load factor (.75) offers a good
 * tradeoff between time and space costs.  Higher values decrease the
 * space overhead but increase the lookup cost (reflected in most of
 * the operations of the <tt>HashMap</tt> class, including
 * <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
 * the map and its load factor should be taken into account when
 * setting its initial capacity, so as to minimize the number of
 * rehash operations.  If the initial capacity is greater than the
 * maximum number of entries divided by the load factor, no rehash
 * operations will ever occur.

通常来说，加载因子的默认值0.75在时间性能和空间消耗之间达到了平衡。较高的值虽然降低了空间消耗，但是却增加了查找时间（反映在HashMap大多数的操作上，包括get和put）。当设置初始容量的时候，应该考虑将要放入map中的元素数量和加载因子，以减少rehash的次数。如果初始的容量比预计的entry数量除以加载因子的商还要大，那么永远不需要rehash操作。

（4）HashMap默认树化（链表转换成红黑树）阈值是8

    /**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

Java8及以后的版本中，HashMap底层数据结构引入了红黑树，当添加元素的时候，如果桶中链表元素超过8，会自动转为红黑树。那么阈值为什么是8呢？来看HashMap源码中的这段注释：

	 * Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million

理想状态中，在随机哈希码情况下，对于默认0.75的加载因子，桶中节点的分布频率服从参数约为0.5的泊松分布，即使粒度调整会产生较大方差。从数据中可以看到链表中元素个数为8时的概率非常非常小了，所以链表转换红黑树的阈值选择了8。

（5）HashMap中一个树的链表还原阈值是6

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

链表树化阀值是8，那么树还原为链表为什么是6而不是7呢？这是为了防止链表和树之间频繁的转换。如果是7的话，假设一个HashMap不停的插入、删除元素，链表个数一直在8左右徘徊，就会频繁树转链表、链表转树，效率非常低下。

（5）HashMap的最小树化容量是64

     /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

为什么是64呢？这是因为容量低于64时，哈希碰撞的机率比较大，而这个时候出现长链表的可能性会稍微大一些，这种原因下产生的长链表，我们应该优先选择扩容而避免不必要的树化。

参考链接：

https://mp.weixin.qq.com/s/aU7aQmSaw7TuLL9ZF-dLgg

芝麻开花不开门

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
3
评论
HashMap常量设计目的

HashMap中有哪些常量？这些常量设计的目的是什么？本篇带你走近Doug Lea、Josh Bloch、Arthur van Hoff、 Neal Gafter对HashMap的设计。（以下都是基于jdk1.8）常量设计（1）HashMap默认初始化大小是1 << 4（即16） /** * The default initial capacity - MUST...
复制链接

扫一扫