HashMap(jdk1.8)源码思想

最新推荐文章于 2024-09-14 18:46:05 发布

技术不值钱

最新推荐文章于 2024-09-14 18:46:05 发布

阅读量140

点赞数

分类专栏： java 文章标签： java hashmap

本BLOG上原创文章未经本人许可，不得用于商业用途。转载请注明出处，否则保留追究法律责任的权利。

本文链接：https://blog.csdn.net/cow4j/article/details/119534178

版权

java 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

HashMap实现于Map接口，元素以键值对的方式存储，允许使用null键和null值，不能保证元素的顺序，是线程不安全的。

HashMap-Hashtable

Hashtable的方法都有synchronized修饰，是线程安全的
HashMap可以使用null作为key，而Hashtable不允许

这里重点说一下hashtable不允许null的情况，在HashMap中，key为null的时候做了特殊处理，hash的时候直接返回0，也就是key为null时在数组的第1个位置。而Hashtable直接抛出了异常。为什么Hashtable要这样做呢？也做个特殊处理不香？

假设Hashtable允许key/value为null，当通过get(key)获取的返回结果是null时，我们无法判断这个key是否真的存在。我们需要调用containsKey方法来判断这个key是value=null还是这个key不存在。如果这个key存在，那么返回对应的value，不存在则返回null。

但是Hashtable是多线程的，当我们使用containsKey判断完key，然后再去get(key)，在这期间可能有其他线程改变了数据map.put(key,null)。本来这个key是不存在的，应该返回null，但是现在这个key存在了，返回也是null。这样前后两次结果虽然一样，但实际上与真实情况不符。

数据结构

HashMap采用Entry数组来存储key-value值，每一个键值对组成了一个Entry实体，Entry类是一个单向的链表结构，这个链表就是用来解决Hash冲突的，在JDK1.8及以后链表还会树化形成红黑树。

源码分析

了解过HashMap的同学应该都知道，HashMap默认大小为16，由数组+链表组成，当有Hash冲突则会在冲突位置形成链表。以2的幂次进行扩容。那我们有没有想过，为什么是2的幂次扩容？

下面带注释的源码就不一一分析了，这里主要分析几个HashMap中的关键点：

key=null

当我们放入一个key为null的键到HashMap中时，HashMapa先在数组的第一个位置(可能是链表或树)查找是否有key=null的元素，如果找到，更新value。如果没找到，则添加到数组的第一个位置(可以理解为添加到链表的头部)。

什么时候扩容？

HashMap在什么时候会扩容，初始容量默认是16，这个值是2的次幂，就算我们自己指定一个非2次幂的值，HashMap也会自己计算出大于给定值的最近的2的次幂值(如果我们指定容量为7，HashMap会设置为8)。

HashMap中一个loadFactor变量，表示负载因子，默认为0.75，threshold = capacity(当前容量) * loadFactor。也就是容量使用超过75%就会进行扩容，并且每次扩容大小都是原先的2倍。

为什么是0.75就扩容？这里涉及到统计学和概率学，跟《泊松分布》有关，本人对这方面的知识一概不知，就不做讨论了。总之我们只要知道，超过0.75这个值的时候，Hash的冲突概率会大大增加，会导致冲突位置的链表过长，影响查找效率。

什么时候树化？

本人看过一些资料介绍的都是，当链表长度达到8就会树化，真的是这样吗？

/**
 * Replaces all linked nodes in bin at index for given hash unless
 * table is too small, in which case resizes instead.
 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    // ......
}

我们看上面代码，还有一个判断(n = tab.length) < MIN_TREEIFY_CAPACITY，也就是数组长度小于64的情况下，是不会树化的，而是进行扩容操作。所以树化的是有两个条件的：

链表长度达到8
数组长度不小于64

那么为什么是8和64？

一些资料介绍都是针对的链表长度达到8这个条件，通过计算查找效率而得出8这个值效率更好，真的是这样吗？

* Implementation notes.
*  ......
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins.  In
* usages with well-distributed user hashCodes, tree bins are
* rarely used.  Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0:    0.60653066
* 1:    0.30326533
* 2:    0.07581633
* 3:    0.01263606
* 4:    0.00157952
* 5:    0.00015795
* 6:    0.00001316
* 7:    0.00000094
* 8:    0.00000006
* more: less than 1 in ten million

在HashMap源码中的Implementation notes注释处有这样一段描述，大致意思就是：当hashCode离散性很好的时候，树型bin用到的概率非常小，数据均匀分布在每个bin中，几乎不会有bin中链表长度达到阈值。但是在随机hashCode下，离散性可能会变差，而JDK又不能阻止用户实现这种不好的hash算法，因此就可能导致不均匀的数据分布。理想情况下随机hashCode算法下所有bin中节点的分布频率会遵循泊松分布，我们可以看到链表长度达到8个元素的概率为0.00000006，几乎是不可能事件。这里为什么是8，并不是一些资料所说的查找效率的原因，而是根据概率统计决定的。

怎么计算hash？

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

这里并不是普通的来计算hash值，而是让key的hash值的高16位也参与运算，减小hash的冲突。这样解释还是一脸蒙。我们来看个例子：

0b开头表示二进制
假设：key.hashCode() = 0b 0010 0101 1010 1100 0011 1111 0010 1110
那么：h = 0b 0010 0101 1010 1100 0011 1111 0010 1110
符号^ (异或：相同返回0，不同返回1)
h >>> 16 (h无符号右移16位，也就是低16位被去掉，前面补16个0)
h >>> 16的结果为：0b 0000 0000 0000 0000 0010 0101 1010 1100
与h进行^操作结果为：
0b 0010 0101 1010 1100 0011 1111 0010 1110
0b 0000 0000 0000 0000 0010 0101 1010 1100
0b 0010 0101 1010 1100 0001 1010 1000 0010（相同返回0，不同返回1）
HashMap计算下标公式：(len -1) & h.hashCode()
符号&（同为1返回1，其它返回0）
假设HashMap长度为16，16-1对应二进制为：0b 0000 0000 0000 0000 0000 0000 0000 1111
我们看到当长度较小时，二进制的高位都是0，这样在做&运算的时候，高位一直都是0，起作用的就是低位。
前面将key.hashCode()的高位与低位进行混淆，就是为了增加低位的随机性，从而在HashMap长度较小的时候降低hash冲突。

为什么是2次幂？

本人认为这里才是HashMap设计的高明之处，我们一起来慢慢分析：

对于HashMap如果是我自己计算key落在数组的哪个位置，我会用key.hashcode%array.length，这样计算的结果必定会在array.length这个范围内，但是HashMap中却是用(array.length-1) & key.hashcode。

2^4=16，对应二进制：10000
2^5=32，对应二进制：100000
2^6=64，对应二进制：1000000

我们看到2的次幂的进进制结果都是整数，以1开头0结尾的数。以上长度-1得到的二进制结果分别为：1111、11111、111111。

假设key.hashcode的值为：0b10011011001，0b10011011001 % 0b10000=1001

我们用位运算&：0b10011011001 & 0b1111 = 1001

有没有发现什么规律？当长度是2的次幂时，这两种计算方法的结果是一样的。但是&运算比%运算要快，这样就提升了性能。如果长度为15，不是2的次幂，15二进制为1111，15-1二进制为1110。如果两个hash值分别为1111、1110，那么1111&1110就会和1110&1110相等，这样就增加了碰撞机率。所以说长度为2的次幂不仅提升的性能，同时也减少了hash的碰撞机率。

这样就完了吗？其实还有另一方面就是扩容后方便定位：

比如长度为16时，hash变量只有后4位会参与到运算，按照2位扩容后，也就是后5位会参与到运算。这个增加的第5位就是特殊元素，如果是0，那么之前的位置和扩容后的位置是一样的，如果是1，那么扩容后的位置就是(原位置索引+原数组大小)。

16扩容后为32
16-1的二进制为1111
32-1的二进制为11111
假设我们的key.hashcode的值为：0b 0100 1101 1001
0b 0100 1101 1001 & 0b 1111 = 1001(转换为10进制为9，也就是在数组9的位置)
0b 0100 1101 1001 & 0b 11111 = 11001(转换为10进制为25，也就是在数组25的位置25=9+16)
有没有发现什么规律？这里如果hashcode的第5位为0，那么扩容后原hashcode计算的下标和没扩容是一样的，这里第5位为1，计算的下标就为原位置+扩容大小。

根据这个规律，我们其实可以应用到数据库的扩容上面，这样可以避免数据库扩容后的数据迁移，具体以后讲数据存储的时候再说。

下面附上部分HashMap源码的部分注释

/**
* The default initial capacity - MUST be a power of two.
* 默认Table大小16
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* MUST be a power of two <= 1<<30.
* Table数组最大长度
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The load factor used when none specified in constructor.
* 负载因子大小
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* 树化值
* 链表长度超过8时，就会可能转换为树结构
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* 树降级称为链表的值
*/
static final int UNTREEIFY_THRESHOLD = 6;
/**
* 当Hash表中的所有元素个数超过64时，才会允许树化
*/
static final int MIN_TREEIFY_CAPACITY = 64;
/**
* 静态内部类
*/
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;//链表结构

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

/* ---------------- Static utilities -------------- */

/**
* 作用：让key的hash值的高16位也参与运算
* 假设：h = 0b 0010 0101 1010 1100 0011 1111 0010 1110
* 0b 0010 0101 1010 1100 0011 1111 0010 1110
* ^（异或：相同返回0，不同返回1）
* 0b 0000 0000 0000 0000 0010 0101 1010 1100(无符号右移16位，前面补16个0)
* =  0010 0101 1010 1100 0001 1010 1000 0010
*
* 计算index是通过(len -1) & h.hashCode()
* 在Table长度不是很长的时候，高16位也能参与运算，减少Hash冲突
* 通过无符号右移、异或运算后
*
*/
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
/**
* 哈希表
* 什么时候初始化？
*/
transient Node<K,V>[] table;
/**
     * The number of key-value mappings contained in this map.
     * 当前哈希表中元素个数
     */
transient int size;
/**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     * 当前哈希表结构修改次数
     */
transient int modCount;
/**
     * 扩容阀值，当你的哈希表中的元素超过阀值时，触发扩容
     */
int threshold;
/**
     * The load factor for the hash table.
     *
     * @serial
     * 负载因子
     * threshold = capacity * loadFactor
     */
final float loadFactor;
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    //大小不能超过最大值，超过最大值自动设置为默认的最大值
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    //负载因子必需大于0
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    /*
        tableSizeFor
        返回一个大于等于当前值cap的一个数字，并且这个数字一定是2的次方数
         */
    this.threshold = tableSizeFor(initialCapacity);
}

final Node<K,V> getNode(int hash, Object key) {
    //tab：引用当前hashMap的散列表
    //first：桶位中的头元素
    //e：临时node元素
    //n：table数组长度
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //判断是否有数据
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        //定位出来的桶位元素，即为我们要get的数据
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        //说明当前桶位不止一个元素，可能是链表也可能是红黑树
        if ((e = first.next) != null) {
            //桶位升级成了红黑树
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            //桶位形成了链表
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    /*
        tab：引用当前hashMap的散列表
        p： 表示当前散列表的元素
        n： 表示散列表数组的长度
        i： 表示路由寻址结果
         */
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //第一次插入数据的时候进行初始化
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;

    /*
        (n - 1) & hash 计算下标
        如果找到的位置刚好是null，这个时候，直接将当前k-v->node扔进去就可以了
         */
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        /*
            如果找到的下标有数据()
            e：不为null的话，找到了一个与当前要插入的key-value一致的key的元素
            k：表示临时的一个key
             */
        Node<K,V> e; K k;
        // 表示桶位中的该元素与你当前插入的元素的key完全一致，后续需要进行替换操作
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 表示桶位已经树化
        // 链表长度达到8，且所有元素超过64个
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 桶位是链表的情况，且链表的头元素与我们要插入的key不一致
            for (int binCount = 0; ; ++binCount) {
                //迭代到最后一个元素也没找到一个与你要插入的key一致的node
                //说明需要加入到当前链表的末尾
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    //当前链表的长度，达到树化标准了，需要进行树化
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 找到相同元素的情况，需要进行替换操作
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    //表示散列表结构被修改的次数，替换Node元素的Value不计数
    ++modCount;
    //插入新元素，size自增，如果自增后的值大于扩容阀值，则触发扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

/**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     *
     * 扩容
     * 为什么需要扩容？
     * 扩容就是把数组扩大，链表过长等都不利于查询
     * 解决哈希冲突导致的链化影响查询效率
     *
     */
final Node<K,V>[] resize() {
    //oldTab：引用扩容前的哈希表
    Node<K,V>[] oldTab = table;
    //oldCap：表示扩容之前table数组的长度
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    //oldThr：表示扩容之前的扩容阀值，触发本次扩容的阀值
    int oldThr = threshold;
    //newCap：扩容之后table数组的大小
    //newThr：扩容之后，下次触发扩容的条件
    int newCap, newThr = 0;
    //如果条件成立，说明hashMap中的散列表已经初始化过了，是一次正常扩容
    if (oldCap > 0) {
        //扩容之前的数组大小已经达到最大阀值，则不扩容。设置扩容条件为int最大值
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //oldCap左移一位实现数值翻倍，并且赋值给newCap，newCap小于数组最大值限制且扩容之前的阀值>=16
        //这种情况下，下一次扩容的阀值等于当前阀值翻倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    //oldCap==0，说明hashMap中的散列表是null
    //1、new HashMap(initCap, loadFactor);
    //2、new HashMap(initCap);
    //3、new HashMap(map);并且这个map有数据
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    //oldCap==0,lodThr==0
    //new HashMap();
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    //newThr为零时，通过newCap和loadFactor计算出一个newThr
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }

    threshold = newThr;
    //创建出一个更长更大的数组
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    //oldTab != null 说明本次扩容之前，table不为null
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            //当前node节点
            Node<K,V> e;
            //说明当前桶位中有数据，但是数据是单个数据，还是链表或红黑树并不知道
            if ((e = oldTab[j]) != null) {
                //方便JVM回收内存
                oldTab[j] = null;
                //当前桶位只有一个元素，从未发生过碰撞，直接计算出当前元素存放在新数组中的位置扔进去
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //当前节点已经树化
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //链表处理
                else { // preserve order
                    //低位链表，存放在扩容之后的数组的下标位置与当前数组的下标位置一致
                    Node<K,V> loHead = null, loTail = null;
                    //01111，15的位置
                    //11111，31的位置
                    //15的位置链化后，只有两种可能，01111或11111
                    //01111的继续存放在15的位置，11111扩容后存放在31位置
                    //高位链表，存放在扩容之后的数组的下标位置为当前位置翻倍+1
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
    //tab：引用当前hashMap中的散列表
    //p：当前node元素
    //n：表示散列表数组长度
    //index：表示寻址结果
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    //桶位是否存在数据，如果成立，说明有数据，需要进行查找操作，并且删除
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        //node：查找到的结果
        //e：当前Node的下一个元素
        Node<K,V> node = null, e; K k; V v;
        //当前桶位中的元素即为要删除的元素
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        //当前桶位可能是链表也可能是树
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode)
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        //上面查找到的Node不为空则进行删除操作
        //matchValue用来判断Value的值，如果传了Value则会判断Value是否一致，一致则删除
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}