HashMap源码分析

最新推荐文章于 2022-04-14 18:49:40 发布

Jerry的技术博客

最新推荐文章于 2022-04-14 18:49:40 发布

阅读量128

点赞数

分类专栏： java基础文章标签： HashMap 源码分析

本文链接：https://blog.csdn.net/xktxoo/article/details/80859048

版权

java基础专栏收录该内容

20 篇文章 0 订阅

订阅专栏

一、HashMap的get()方法在java中的工作原理

哈希相关的数据结构本质上都是键值对(key value pair)，HashMap的工作原理是利用哈希(散列)，用put()方法和get()方法来存储和检索HashMap对象。

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time

几个关键信息：基于Map接口实现、允许null键/值、非同步、不保证有序、也不保证顺序不随时间变化

二、HashMap键值hashcode冲突

put()方法

如果HashMap存储空间为空（new HashMap时不立即分配存储），则用threshold大小分配存储
如果没有发生碰撞，则直接放入bucket中
如果发生碰撞，则以链表的形式存在buckets后
如果buckets长度过长（大于TREEIFY_THRESHOLD），则将链表转换为红黑树
如果节点已存在，则替换并返回oldValue
如果bucket满了（大于 loadfactor * capacity）, 则需要再哈希扩容

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

计算下标
- key为空时，hash值为0
- key不为空时，key的hashCode与该hashCode高16位按位异或的结果。解决table较小时，高位没有参与下标运算的问题
- 下标是hash值与table长度（2^x - 1)按位与的结果，而没有使用“%”。原因是“%”计算的开销比“&”大
```
下标计算：
    i = (n - 1) & hash

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
```

get()方法

节点判断（hash相等 & key相等）
如果是bucket的第一个节点，则直接命中返回
如果有冲突，则继续在链表或树中查找
- 若为树，则通过key.equals(k)查找，时间复杂度为：O(logn)
- 若为链表，则通过key.eqauls(k)查找，时间复杂度为：O(n)

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

三、HashMap扩容

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

HashMap中有两个重要的参数，一个是负载因子(loadfactor)，另一个是容量(capacity), 当bucket占用程度超过了负载因子希望的比例(0.75)，则需要扩容，即将现有容量扩充为原来容量的2倍。如果不扩容，则哈希冲突的概率会大大增加，HashMap的性能会下降；另一方面，容量也不是越大越好，容量越大，冲突的概率越小，但数组遍历的代价也会增大，且会造成资源浪费。

resize()方法

Node节点中的hash不等于hashCode()方法返回值
初始容量是放置在threshold中
如果oldCap大于等于MAXIMUM_CAPACITY（1 << 30）, 则不再重新分配存储，否则capacity扩充为原来两倍并分配存储
resize过程中，计算node新下标位置(newCap - 1) & hash
- 如果(hash & oldCap) == 0, 则bucket下标不变
- 如果(hash & oldCap) == 1, 则bucket下标变为：原位置 + oldCap

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

四、HashMap的线程不安全性
- 哈希碰撞造成的线程不安全
  假设有两个线程存储map元素时同时产生碰撞，并且hash值相同，获得了同一个节点，最终结果将只有一个新节点添加成功
- 扩容造成的线程不安全
  假设两个线程存储map元素时同时发现需要扩容，两个线程都生成了自己的新table，最终将只有一个新table生效
五、HashMap的key取值
HashMap的key取值尽量用字符串、整形或其他包装数据类型，因为这些类型的对象是不可变的（final）, 对象一旦生成，则该对象在生命周期内的状态不会变化。HashMap会根据hash值和eqauls方法找到对应的存储对象，如果HashMap存储的key状态变化，则在调用get()方法时会获得错误的值。
正确的选用key
- 使用String和Long、Integer等包装类型
- 使用自定义的不可变（final）类型
- 使用可变类型，应该保证hashCode方法和equals方法的幂等性
六、HashMap与HashTable的区别
- HashMap允许null键/值，但HashTable不允许
- HashMap非线程安全，HashTable线程安全（synchronized）
- 由于HashTable是线程安全的，所以单线程环境下，HashMap速度比HashTable快
参考