HashMap源码解析

最新推荐文章于 2024-08-31 10:44:02 发布

做一只安静的猫

最新推荐文章于 2024-08-31 10:44:02 发布

阅读量57

点赞数

分类专栏：集合文章标签： java hashmap

本文链接：https://blog.csdn.net/weixin_36516088/article/details/115188247

版权

集合专栏收录该内容

1 篇文章 0 订阅

订阅专栏

HashMap源码解析

文章目录

HashMap源码解析

概要

这里讨论的是jdk1.8版本的HashMap,我们知道1.8以前的HashMap是数组加链表的实现，而1.8及以后变成了数组链表红黑树的实现方式。

hashMap有以下特点：

关注点
是否允许为空	key和value允许为空，但是最多只能有一个key为空
HashMap是否允许重复数据	key不允许重复，重复的key会覆盖
HashMap是否有序	无序
是否线程安全	线程不安全

HashMap的数据结构

preview

HashMap重要变量解析

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认的初始化容量，1左移4位，即为16

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

默认的最大容量，2的30次方

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

默认的扩容因子，0.75，例如初始化容量为16时，当hashMap当前的元素总量>16*0.75 =12 时，就会触发扩容

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

默认转化为红黑树时的链表长度，当链表的长度为7时，再往后插入就会触发链表转红黑树

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

红黑树退化为链表的长度

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

转化为红黑树map中所需的最小的容量，所以map中链表转红黑树需满足两个条件，链表的长度大于7，map的容量大于等于64

源码解析

以一段最简单的代码来解析HashMap的put方法

HashMap<Integer,String> map = new HashMap<>();
map.put(1,"1");
map.put(1,"2");
......
map.put(17,"17");

申明map

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

可以发现，new HashMap时其实什么都没有做，只是申明了下默认的扩容因子，当然，我们也可以用有参的构造方法来创建HashMap,其他的创建HashMap的方法如下：

/**
 *指定初始化容量和扩容因子去构造HashMap
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    ......
}

/**
 * 使用默认的扩容因子，但指定初始化容量
 */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

put操作

put函数如下：

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

看下hash()函数：

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

(h = key.hashCode()) ^ (h >>> 16)方法的意思是，使用当前key的hash值于此hash值右移16位的值，进行抑或运算，即通过hashCode()的高16位异或低16位。这么做的的原因主要是从速度、功效、质量来考虑的，即使在数组table的length比较小的时候，也能保证考虑到高低Bit都参与到Hash的计算中，同时不会有太大的开销，举例如下：

例如1的hashCode为1，用32位二进制数字表示则为000000000000000000000000000001，右移16为后，则为000000000000000000000000000000，两者异或：

000000000000000000000000000001

000000000000000000000000000000

得到000000000000000000000000000001，这样就让这个key的高低位都参与到了hashCode的运算。

进到putVal函数

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //当hash表为空时，进入到此方法
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //如果hash表不为空，对应的槽上不存在元素，则new 出一个链表放到该槽中
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else { // 如果槽上已存在元素，说明存在hash冲突
        Node<K,V> e; K k;
        //如果hash值相等，再判断key是否相等，都相等，直接替换
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        //如果该槽上是红黑树，执行红黑树的插入操作
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            //进入这个逻辑，说明该槽上是个链表，for循环找到尾节点
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    //找到尾节点后，将尾节点的next指向新插入的node
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                //往下遍历
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //统计操作次数，用于判断是否有其他线程修改了map
    ++modCount;
    //如果容量超过，就需要进行扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

当我们执行第一个map.put(1,“1”)方法时，在new HashMap()时并没有初始化Hash表，通过阅读上面的代码，我们可以猜测是在resize方法里面初始化了hash表，看下resize()方法：

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        //如果扩容时容量已经>=最大容量，直接返回，无法再进行扩容
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //如果还未扩容到最大容量且容量大于16，扩容为原来的一倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) //说明调用了有参构造方法初始化hashMap
        newCap = oldThr;
    else {               // 说明调用了默认的构造方法
        //初始化hashmap,申明默认容量和扩容所需容量
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    //申请数组用于扩容
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) { //当老数组不为空时进入，在new HashMap()时显而易见是不会进入此方法的，在真正执行扩容时才会进入
        //逐个遍历
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) { //获得第j个槽上的node
                //帮助垃圾回收
                oldTab[j] = null;
                if (e.next == null) //如果这个槽上只有一个node,那么直接rehash分配即可
                    //e.hash&(newCap - 1) 与e.hash/newCap的值相等
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode) //如果是红黑树，执行红黑树的rehash 
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // 如果是链表，此处单独分析
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

上面的注释已经详细的说明了new HashMap()时resize()具体的作用，但是，当真正发生扩容时，并没有完全说明白，下面我先列出结论，然后给出证明

当槽上只有一个node的时候，直接rehash()这个node即可
红黑树暂不讨论
当槽上是链表时，则遍历链表上的每一个node,如果该node.hash & oldCap == 0，则还在原槽位上，否则，在原槽位的索引+原数组容量长度的位置

下面给出证明：

假设有这三个key: 1、17、33、49，在未rehash()之前，我们通过计算可知，这四个元素是会分配在同一个槽里面用链表相连的：
在这里插入图片描述

实际情况可参考下图：
在这里插入图片描述
在执行链表扩容的这段代码时会发生什么呢？

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
    next = e.next;
    if ((e.hash & oldCap) == 0) {
        if (loTail == null)
            loHead = e;
        else
            loTail.next = e;
        loTail = e;
    }
    else {
        if (hiTail == null)
            hiHead = e;
        else
            hiTail.next = e;
        hiTail = e;
    }
} while ((e = next) != null);
if (loTail != null) {
    loTail.next = null;
    newTab[j] = loHead;
}
if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead;
}

第一次遍历， next = 17, e.hash & oldCap== 0, loHead = 1, loTail = 1;
在这里插入图片描述

第二次遍历，next = 33, e.hash & oldCap == 16, hiHead = 17, hiTail = 17;
在这里插入图片描述

第三次遍历：

在这里插入图片描述

第四次遍历：

在这里插入图片描述

遍历完毕后：

if (loTail != null) {
    loTail.next = null;
    newTab[j] = loHead; //将loHead放在原位置
}
if (hiTail != null) {
    hiTail.next = null;
    newTab[j + oldCap] = hiHead; //把hiHead放在原位置+原数组长度的位置
}

自此put()操作的重点（除了红黑树部分）已经全部讲完

再谈源码中的一些需要关注的点

为什么Hash桶的值必须是2的偶数幂？

从代码反证来看的话，这么做有这几点好处：

方便扩容，从上面的分析得知，在扩容时，原数组的node元素要么位置不变，要么只需加上原数组的长度便可得到，扩容的效率较高
方便hash取模运算，e.hash&(n -1) == e.hash/n的前提条件为，n为2的偶数幂

hash函数

上面已经说明清楚，这是一种非常巧妙的hash计算方法

为什么引入红黑树？什么情况下会进化为红黑树？

引入红黑树是防止极端情况退化成一个长链表，查询效率变为O(n),而引入红黑树后，最糟糕的情况也有O(logN)的时间复杂度；

只有当链表长度达到8且数组容量大于等于64两个条件同时满足时才会进化为红黑树。

为什么HashMap线程不安全？

首先代码里面是存在非原子的操作的，例如modCount++
在扩容时，当node迁移的同时，另一个线程往里面put元素时，可能会导致数据的丢失，原因在于下面的代码：
```
if ((e = oldTab[j]) != null) {
    oldTab[j] = null; //帮助垃圾回收
    ......
  }
```

试想第二个线程put完，第一个线程又将刚才put的槽置为null,数据自然就丢失了。

1.8版本前，由于hashMap的put是用的头插法，在扩容时，如果发生线程竞争，可能会导致链表死循环，导致cpu 100%

做一只安静的猫

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap源码解析

HashMap源码解析概要这里讨论的是jdk1.8版本的HashMap,我们知道1.8以前的HashMap是数组加链表的实现，而1.8及以后变成了数组链表红黑树的实现方式。hashMap有以下特点：关注点是否允许为空key和value允许为空，但是最多只能有一个key为空HashMap是否允许重复数据key不允许重复，重复的key会覆盖HashMap是否有序无序是否线程安全线程不安全HashMap的数据结构HashMap重要变量解析/**
复制链接

扫一扫

专栏目录