HashMap源码，你认真读了吗？

最新推荐文章于 2024-07-22 14:15:06 发布

一只程序汪

最新推荐文章于 2024-07-22 14:15:06 发布

阅读量250

点赞数 4

分类专栏：源码分析文章标签： java hashmap 数据结构面试

本文链接：https://blog.csdn.net/qwer0794/article/details/105450935

版权

源码分析专栏收录该内容

2 篇文章 0 订阅

订阅专栏

HashMap源码之你不知的几个秘密

一、写在前面

很多人说自己看过源码。可是你有认真思考过吗？
本程序汪将带你进入神秘的源码世界，一探究竟。

在这里插入图片描述

二、HashMap结构

哈希表是一种经典的数据结构，它的实现是基于哈希值的桶(bin)和链表。
优点：O(1)的平均查找、删除、插入的时间（注意是平均）
缺点：哈希碰撞

举个例子：
大家上学时都用过新华字典吧？
假设我们想查个汉字"方"。
（1）找到F的目录
（2）在F目录下找到“方”
（3）翻到“方”的那一页去

这个场景就有点像Hash表的结构原理了，你把目录A-Z想象成很多个桶,把F目录想象成其中的一个桶，F目录下很多汉字想象成链表。这样查一个东西是不是很方便？

为了方便大家想象，我画了一个图。
在这里插入图片描述

这个桶通常就是由数组实现的，因为数组无论有多少个，查找时间总是O(1)。

说一下哈希表的软肋吧。
就像刚才讲的那样，如果桶下有很多元素就容易造成Hash碰撞。比如F目录下有很多相同首字母的汉字，这样你再加个新汉字也是F首字母“放”，“房”，“防”等等，脑补下。
那我们字典是怎么做的，按一定排序顺序排下来了，这样形成了一个“链表”结构。查起来就方便拉。
链表这样有前后节点的数据结构，当有相同hash值的数据插入时，就在某个桶下挂个链表，这样就解决哈希碰撞的问题。你来一个相同的hash值数据，我就往链表下面挂一个。

当然了，我上面所讲的是经典的哈希表实现方式，HashMap的实现，优化了经典的hash表数据结构。

三、HashMap的常见面试问题

拷问灵魂深处的几个问题

HashMap的数据结构是怎样的？为什么这样设计？
HaspMap的数组大小为什么一定要是2的幂？
HashMap默认容量多少？为什么设置成这个？
HashMap的加载因子是怎么回事？
HashMap如何扩容？
HashMap是线程安全的吗？为什么？
…

这要是面试，你是不是已经懵逼了？
平时就TM会用，put、get、遍历。大家都是粗来混的嘛，给点面子好不好？

我摊牌了，不装了。面对上面几个问题，你别慌。打开IDEA，导入java源码，我们一步步来看。我们干入主题！！！

四、手撕HashMap源码

通常讲到HashMap都要涉及jdk1.7 和 jdk1.8，所以接下来，我将从1.7和1.8讲解。我现在就站在读者的角度，我是个菜鸟，不过我是个有技巧的小菜逼。

1、JDK 1.7 HashMap

看源码有技巧的，以我个人经验呢。我一般会先看全局变量，在看构造器，最后看方法。我们就按照这个步骤来。

	/**
     * The default initial capacity - MUST be a power of two.
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

我们第一眼，就是看到这个全局变量。注解上说是默认初始化容量，特意说明一定要是2的幂。然后，发现这个值是1左移4位，即为16。
AKA，是一个网络流行词，是“Also Known As”的缩写，意思是又名、亦称、也被称为。我心想，TM，这开发者还是个弄潮儿。

(1) 这里为什么不直接写16？
答：计算机属于二进制世界产物，计算机位运算跟我们人类十进制世界加减乘除一样简单。你直接写个16反而它要算一次存下来。

(2) 为什么一定是2的幂？
答：这是面试经常问到的，要解开这个问题，需要涉及到下面几个问题。稍后为你揭晓。

读者到这里，发现一行代码这么多学问？还有，为什么你能看到和想到这么多细节。这就是平时的思维方式，多想一个为什么，你将获得不少收获。

	 /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

最大容量2^30，设计师对HashMap做了限制，最大不能超过这么多，估计也没有一个HashMap能存这么多东西了吧。

 	/**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

默认加载因子0.75f,注解上说在构造器中未指定时使用的负载系数。
我们去看一眼默认构造器。

	/**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

确实是，默认构造器调用了一个有参构造器，第一个参数是默认初始化容量，第二个是默认加载因子。
那不就是我们平时new HashMap()的时候，其实是调用new HashMap(16,0.75f);

好的，我们的问题又来了。
问题:为什么加载因子是0.75f ，而不是0.8f,0.5f呢？是不是随便给的？
答：小伙子你要相信科学，听我娓娓道来。
先看下HashMap这个类上面的注释。

 *<p>As a general rule, the default load factor (.75) offers a good tradeoff
 * between time and space costs.  Higher values decrease the space overhead
 * but increase the lookup cost (reflected in most of the operations of the
 * <tt>HashMap</tt> class, including <tt>get</tt> and <tt>put</tt>).  The
 * expected number of entries in the map and its load factor should be taken
 * into account when setting its initial capacity, so as to minimize the
 * number of rehash operations.  If the initial capacity is greater
 * than the maximum number of entries divided by the load factor, no
 * rehash operations will ever occur.

通常，默认负载因子（.75）提供了很好在时间和空间成本之间的权衡。较高的值可减少空间开销，但增加了查找成本。设置初始容量时应考虑到这一点，以最大程度地减少重新哈希操作的次数。如果初始容量比桶(entry节点)的数量除以负载系数的值还大，重新哈希操作还会发生。
也就是说值太小，浪费空间，值太多浪费时间开销。权衡后，设置为0.75f最好。

我们继续看全局变量。

	/**
     * An empty table instance to share when the table is not inflated.
     */
    static final Entry<?,?>[] EMPTY_TABLE = {};

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

看到这里，你就想，之前讲哈希表是数组加链表实现的。
这里的Entry<K,V>[] table不就是他数组的结构嘛。这里就看到哈希表的影子了，不过这里的数组实现也有他自己实现的技巧。有兴趣的可以去看下，我就不多说了。
还有这里再一次提到了这个表一定要是2的幂。

不急，我们把其他几个变量看完先。

	 /**
     * The number of key-value mappings contained in this map.
     */
    transient int size; //这里的size就是hashMap的size
    

    /**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.
    int threshold; //这里设计师自己注解了，所以这个值就是默认容量*负载因子，并指出初始化的hashMap为空时值为12。最后还特意说了，扩容时表将重新创建。

    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;

    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;//修改次数，遍历时修改map里面的元素，可能会抛出ConcurrentModificationException，与快速失败有关。

    /**
     * The default threshold of map capacity above which alternative hashing is
     * used for String keys. Alternative hashing reduces the incidence of
     * collisions due to weak hash code calculation for String keys.
     * <p/>
     * This value may be overridden by defining the system property
     * {@code jdk.map.althashing.threshold}. A property value of {@code 1}
     * forces alternative hashing to be used at all times whereas
     * {@code -1} value ensures that alternative hashing is never used.
     */
    static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer.MAX_VALUE; //哈希默认最大阈值，具体看注解，设计师还提出了怎样去修改这个值，你可以看下HashMap内部类Holder

看完了全局变量，接下来就是构造器了。上面有提到默认构造器实际上，调用传入两个默认参数进行初始化HashMap。那我们就直接看这个构造器吧。

	/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }

上面这些相信你们一定能看懂，主要看init()方法，我跟进去，发现是个空方法，其实这个方法是给LinkedHashMap使用的，因为它继承了HashMap。

构造器没什么，但是你要想，既然这个双参构造器为public，那我传入一个不是2次幂的值会咋样。嘿嘿嘿，我就喜欢跟设计师反着干。
你能想到的，设计师也能想到。

构造器看完了，就去看方法，我们常用的方法就是PUT GET。

HashMap的PUT方法

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
        //如果为空表，就初始化一个指定阈值的表，详见下面inflateTable解析
            inflateTable(threshold);
        }
        //如果key为null的逻辑，所以说HashMap的key可以为空，详见下面的putForNullKey方法
        if (key == null)
            return putForNullKey(value);
        //对key进行hashCode计算，这个方法很重要。详见hash方法解析
        int hash = hash(key);
        //indexFor又是一个很重要的方法，与2的幂相关。详见indexFor方法解析
        int i = indexFor(hash, table.length);
        //找到了桶的位置，就遍历桶里面的元素吧。
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        //没有的时候就put进去呗
        addEntry(hash, key, value, i);
        return null;
    }

这里有个问题，为啥put要去判断是不是为空呢？
当你new HashMap的时候，我们刚才看了构造器，他基本除了判断下参数就没干啥了。那说明什么啊？hashMap是在第一次put的时候初始化的！！！
为什么这样呢？
其实我想啊，这是开发者怕使用者new出来不去使用，hashMap的默认大小为16，new出来不用不就浪费资源了吗？那什么时候用？当然是put以后啦。

你内心OS：设计师鬼才啊！

inflateTable方法解析

	/**
     * Inflates the table.
     */
    private void inflateTable(int toSize) {
        // Find a power of 2 >= toSize
        //这里又提到了2的幂，这个值最后作为了哈希表的容量，详见roundUpToPowerOf2
        int capacity = roundUpToPowerOf2(toSize);
		
        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);//阈值
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);
    }

roundUpToPowerOf2方法

 /**
  * 为了计算哈希表容量的算法
  */
private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
    }

传到这里的number就是双参构造器设置的那个数值。
其实这里就是找了个最接近你传入的值的2的幂。并向上取值
比如你传入的是15，那么这里返回的就是16。你传入17，那返回就是32。
所以不管你传什么，它最后都会变成2的幂。

putForNullKey方法

	/**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        //从数组下表为0开始查找,key为null的Entry
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        //如果没找到，就加到数组下表为0的位置，但是这个方法涉及了扩容，详见addEntry解析
        addEntry(0, null, value, 0);
        return null;
    }

这里又涉及到了一个知识点，看到return oldValue,你能无动于衷吗？说明put方法是返回旧值的value。知识点啊，拿本本记下来先。

hash方法解析

	/**
     * Retrieve object hash code and applies a supplemental hash function to the
     * result hash, which defends against poor quality hash functions.  This is
     * critical because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

这里做了很多异或、无符号左移运算，为的是让hash值分布变得更均匀一些。防止严重的hash碰撞，但即使是这样，还是会出现问题，jdk1.8中将这一串代码去掉了。
不过说实话，这么多运算有点恶心啊。

indexFor方法解析

	/**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }

这是方法是不是很简单，就一个与运算，可是藏着大大的奥秘啊。
h参数是hash值，hash值是一个int范围的数字，length是当前hash表的桶的数量。
假设我们当前桶有16个，length-1 = 15。15的二进制表达为：0000 1111。
HashCode我们假设有三个，为1111 11111 ； 0101 0101 ；0010 0011
读者试着自己进行与运算，最后得出结果：
1111
0101
0011

三个结果都不大于0001 0000，即16，这就是与运算在这里的巧妙之处。设计师利用表最大长度-1，将高位置为0，低位全都置为1，使得与运算结果不超过这个表的最大长度。
这里就解答了，为什么容量一定是2的幂，是为了方便对hash值落在数组位置的计算！！！这样在不进行扩容的场景下，每次hash值都能找到对应的桶的位置。
不得不说一下，秒啊！！！设计师是村里最靓的仔。

addEntry方法解析

	/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            //扩容到之前的2倍容量，详见resize方法
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
       //采用头插法进行插入链表数据，详见createEntry方法分析
        createEntry(hash, key, value, bucketIndex);
    }

既然HashMap初始化容量为16，当put操作后，哈希表容量超过这个阈值，HashMap就会去进行扩容操作。这里可以看到，扩容后是之前容量的2倍。进一步验证了以容量必须以2的幂的问题。

resize方法解析：

	/**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        //扩容后，之前的元素怎么放到新的哈希表中呢？详见transfer方法
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

设计师在这里贴了个小tips，说扩容是自动进行的，你无感。
对所有的内容进行了rehash操作，就是重新算hash值，因为“搬家”后，你的东西放哪，是不是要告知计算机？
上面都是一些转换思想，没什么好说的，唯独transfer方法需要注意。

transfer方法（一切罪恶的根源）

	/**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        //遍历旧哈希表
        for (Entry<K,V> e : table) {
            while(null != e) {//遍历链表
                Entry<K,V> next = e.next;//保存下一次循环的Entry 
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                //重新计算它在新表的位置
                int i = indexFor(e.hash, newCapacity);
                //如果i位置原来没有值，则直接插入；有值，采用链头插入法
                e.next = newTable[i];
                newTable[i] = e;
                //轮替，为下一次循环
                e = next;
            }
        }
    }

注释告知我们这是把之前所有的entry搬到新的HashMap里。其实不用看注解你也知道啊。但是rehash操作，会打乱之前链表顺序。在多线程情况下可能导致死锁发生。

(1)正常的ReHash的过程
画了个图做了个演示。

我假设了我们的hash算法就是简单的用key mod 一下表的大小（也就是数组的长度）。
最上面的是old hash 表，其中的Hash表的size=2, 所以key = 3, 7, 5，在mod
2以后都冲突在table[1]这里了。
接下来的三个步骤是Hash表 resize成4，然后所有的<key,value> 重新rehash的过程

(2)并发下的Rehash
a.假设我们有两个线程。我用红色和浅蓝色标注了一下。
我们再回头看一下我们的 transfer代码中的这个细节：

do {
    Entry<K,V> next = e.next; // <--假设线程一执行到这里就被调度挂起了
    int i = indexFor(e.hash, newCapacity);
    e.next = newTable[i];
    newTable[i] = e;
    e = next;
} while (e != null);

而我们的线程二执行完成了。于是我们有下面的这个样子。
在这里插入图片描述
注意，因为Thread1的 e 指向了key(3)，而next指向了key(7)，其在线程二rehash后，指向了线程二重组后的链表。我们可以看到链表的顺序被反转后。

b.线程一被调度回来执行。
在这里插入图片描述
c.一切貌似顺利。

线程一接着工作。把key(7)摘下来，放到newTable[i]的第一个，然后把e和next往下移。先是执行 newTalbe[i] = e;
然后是e = next，导致了e指向了key(7)，
而下一次循环的next = e.next导致了next指向了key(3)
在这里插入图片描述
d.环形链接出现。

e.next = newTable[i] 导致 key(3).next 指向了 key(7)

注意：此时的key(7).next 已经指向了key(3)，环形链表就这样出现了。
在这里插入图片描述
于是，当我们的线程一调用到，HashMap.get(11)时，悲剧就出现了——Infinite Loop。(死循环)

有人把这个问题报给了Sun，不过Sun不认为这个是一个问题。因为HashMap本来就不支持并发。要并发就用HashTable或ConcurrentHashmap

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6423457
对于这个BUG，SUN死鸭子嘴硬不愿意改。
其实影响SUN公司升级HashMap的其实跟Tomcat的内部邮件有关系。
http://mail-archives.apache.org/mod_mbox/www-announce/201112.mbox/%3C4EFB9800.5010106@apache.org%3E
信中讲到HTTP GET使用HashMap存放请求的params，当参数数量过多时，若有人精心构造参数（DoS攻击），会在一个桶下挂一个很长的链表，会有很严重的hash冲突，导致服务器的性能很差。Tomcat官方提交了这个事故给Oracle公司，但是Oracle未能及时解决。于是Tomcat官方自己想出了临时解决方案(限制传入参数个数为10000个)

createEntry方法分析

 	/**
     * Like addEntry except that this version is used when creating entries
     * as part of Map construction or "pseudo-construction" (cloning,
     * deserialization).  This version needn't worry about resizing the table.
     *
     * Subclass overrides this to alter the behavior of HashMap(Map),
     * clone, and readObject.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];//将index位置数据放到e里
        table[bucketIndex] = new Entry<>(hash, key, value, e);//将e作为next节点，并将当前节点上移
        size++;
    }

这就是jdk1.7时，采用的头插法原理。

HashMap的GET方法

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

相比PUT ,GET 方法就更简单了。如果你认真看完了上面的内容，相信你可以自己分析。

2、JDK 1.8 HashMap

Java 8在 Java 7的基础上，解决了两个问题：
(1)出现死锁的问题（但依旧不是线程安全的）。
(2)链表导致性能问题。

jdk1.8将数组加链表的结构改进为数组加红黑树，还有将扩容时插入顺序的改进。

全局变量增加了

	/**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

    /**
     * The bin count threshold for untreeifying a (split) bin during a
     * resize operation. Should be less than TREEIFY_THRESHOLD, and at
     * most 6 to mesh with shrinkage detection under removal.
     */
    static final int UNTREEIFY_THRESHOLD = 6;

    /**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

TREEIFY_THRESHOLD指的是链表变成树形的阈值为8，
UNTREEIFY_THRESHOLD 指的是红黑树变成链表的阈值为7；
MIN_TREEIFY_CAPACITY 指的是最小树化容量。

这里就会引发别人思考这两个值怎么来的？
其实在注解里设计者有特意说明：
* Because TreeNodes are about twice the size of regular nodes, we * use them only when bins contain enough nodes to warrant use * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) with a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, the expected * occurrences of list size k are (exp(-0.5) * pow(0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * more: less than 1 in ten million
开发者说，红黑树的节点类型别链表的节点类型的空间一般要大两倍。所以，只有当桶里有足够多元素时，才去做树化操作。
然后他罗列了，一些数据，这些数据是根据泊松分布理论来的。
当数值为8时，概率比百万分之一还要小。所以树化时，桶里面的元素要大于8，反之小于6就变回链表。
我草，这尼玛设计师，还是位科学家！！！

1.8 HashMap源码我们就不细看了，主要看关键方法的不同点，我先画下现在java 8中HashMap的数据结构
在这里插入图片描述

也就是说我们数组下，挂的可能是树也可能是链表。

putVal方法

	 /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
           // 哈希表为空，初始化
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            //计算出的下标桶位置无节点，新增节点
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                //hash值存在情况
                e = p;
            else if (p instanceof TreeNode)
                //节点类型为树节点
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                //节点类型为链表情况
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        //如果桶数量大于等于树化阈值-1，也就是7，去树化桶
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

resize方法

	/**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // 经过上面的while循环处理，就是相当于把节点hash进行与计算分为2种情况:0 或 1
                        // 各自由一个链表负责，采用的尾插法
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

resize方法提供两种作用，第一个就是初始化表数据。第二个就是扩容操作。
扩容转移数据到新表里的时候，设计者特意注释了，转移数据的时候，保持了原表中数据的顺序，这样就不会形成环状链表，所以避免了死锁。但是要注意，它仍旧是线程不安全的容器。因为在并发时，容器里面的值可能被修改，你拿到的值可能和别人拿到的不一样。

这里注意事项：resize其实很费时，当你知道你的Map将会存放很多元素时，建议你初始化HashMap的大小。
也就是拿空间换时间这么一个概念。
空间和时间只能做一个权衡考量，不能面面俱到。

JDK 1.8 中 HashMap 的扩容操作就显得更加的骚气了，由于扩容数组的长度是 2 倍关系，所以对于假设初始 tableSize = 4 要扩容到 8 来说就是 0100 到 1000 的变化（左移一位就是 2 倍），在扩容中只用判断原来的 hash 值与左移动的一位（newtable 的值）按位与操作是 0 或 1 就行，0 的话索引就不变，1 的话索引变成原索引加上扩容前数组，由此保证了数据迁移的有序性，所以其实现如下流程图所示：
在这里插入图片描述