HashMap源码分析

Counter-Strike大牛

已于 2024-04-04 11:27:41 修改

阅读量674

点赞数 14

文章标签：哈希算法算法 HashMap java

于 2024-03-17 21:47:55 首次发布

本文链接：https://blog.csdn.net/qq_34972627/article/details/136791054

版权

扰动函数

为什么使用扰动函数

增加随机性，让元素散列均匀，减少碰撞。

源码分析

看下hashMap计算hash的源码：

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

可以看到，计算hash时，使用hash与右移16位的hash做了异或运算。16位正好是自己二进制长度的一半，之后与原hash的做异或运算，这样就混合了原hash中的高位和低位，增大了随机性。
然后再用这个二进制数字与map容量减一进行与运算，就得到了这个key应该存放的位置。

初始化容量

先说总则：初始化容量只能是2的n次幂，如果声明不是，则自动转换为大于声明容量的最小的2的n次幂。
先看源码：

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

可以看到在初始化容量时，如果容量不小于0并且没有达到最大容量，则调用tableSizeFor()方法。
tableSizeFor()源码如下：

/**
  * Returns a power of two size for the given target capacity.
  */
static final int tableSizeFor(int cap) {
  int n = cap - 1;
  n |= n >>> 1;
  n |= n >>> 2;
  n |= n >>> 4;
  n |= n >>> 8;
  n |= n >>> 16;
  return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

|= 运算符相当于“或等”，即两个数某一位有一个是1即为1。所以这一串操作下来，其实是把传入的容量cap，变成了111……，正是大于cap的最小的2的几次幂-1，最后返回n+1，就正好是2的几次幂了。
比如传入的是17，二进制为`10001`，依次的执行结果如下：
int n = cap - 1 = 10000
n |= n >>> 1 = 11000
n |= n >>> 2 = 11110
n |= n >>> 4 = 11111
n |= n >>> 8; //不需要
n |= n >>> 16; //不需要
这样就得到了11111，即31，最后返回n + 1即32。
为什么一定要是2的n次幂呢？
这就与上面的扰动函数关联起来了。2的n次幂减一正好是11111……这样的形式，与扰动函数的hash进行与运算，可以使散列更加均匀，减少碰撞。

负载因子

/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

这是HashMap默认的负载因子，0.75，当使用容量达到75%时，map就会自动扩容。
通过上面的构造函数可以看出来，这个负载因子我们也可以通过构造函数在创建map的时候传进去。
负载因子越小，就越不容易产生碰撞，map的性能也就越好。所以如果希望用空间换时间，可以把负载因子设置的小一些。
map扩容大小为原来的二倍`newCap = oldCap << 1`。

扩容元素拆分

map进行扩容后，原来的元素就要拆分到新的map中。JDK1.7时，需要重新计算hash值，比较费时。而JDK8中进行了优化，不再需要重新计算hash值了。
那么JDK8是如何进行拆分的呢？

if (e.next == null)
    newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
    Node<K,V> loHead = null, loTail = null;
    Node<K,V> hiHead = null, hiTail = null;
    Node<K,V> next;
    do {
        next = e.next;
        if ((e.hash & oldCap) == 0) {
            if (loTail == null)
                loHead = e;
            else
                loTail.next = e;
            loTail = e;
        }
        else {
            if (hiTail == null)
                hiHead = e;
            else
                hiTail.next = e;
            hiTail = e;
        }
    } while ((e = next) != null);
    if (loTail != null) {
        loTail.next = null;
        newTab[j] = loHead;
    }
    if (hiTail != null) {
        hiTail.next = null;
        newTab[j + oldCap] = hiHead;
    }
}

这是map扩容方法（resize()）中的一段代码，大概逻辑如下：
是否有下一个节点（链表或者红黑树），如果没有，即非链表非树，直接用hash & 新的容量计算出下标；
如果有下一个节点，是树结构，则会走树的一套逻辑，这里就不赘述了，里面拆分的方式见下面的链表；
否则，就是说链表结构，就会计算`hash & 原容量`，计算出的值如果为0，则不变；否则，元素下标变为原下标+原容量。

CRUD

插入

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    // 计算hash并调用putVal()
    return putVal(hash(key), key, value, false, true);
}

put中直接调用了putVal，我们看putVal的源码：

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // 初始化一些待使用的变量，包括临时表tab、当前节点p、容量n、当前节点下标i
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 如果表为空，则初始化容量
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 如果当前节点为null，即不存在其他元素，则直接初始化元素放入节点即可
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // 当前节点有元素的情况
    else {
        Node<K,V> e; K k;
        // key相同，替换值
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // key不同，当前节点为树节点，则调用putTreeVal()方法，将当前元素插入红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // key不同，当前节点为链表
        else {
            // 遍历当前元素应该插入的位置，binCount为链表长度
            for (int binCount = 0; ; ++binCount) {
                // 当前节点的最后一个元素，该插入到此
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // 如果链表长度大于等于树化阈值，则调用treeifyBin()转化为红黑树（这里并不会直接转为红黑树，在treeifyBin()方法中，如果map容量不到64，则先扩容，否则才转化为红黑树）
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // 还是key相同替换值的情况
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 在这里把相同key的旧值替换掉了
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 如果已使用容量超过扩容阈值，则进行扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

查找

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

计算hash后调用了getNode方法：

/**
 * Implements Map.get and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    // 声明一些待使用变量
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 如果map不为空 且 容量大于0 且 计算出该key位置的元素不为null
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 如果取出该位置的元素的hash与当前key的hash相同 且 key相同
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            // 返回取出的元素
            return first;
        // 有下一个元素，即链表或树结构
        if ((e = first.next) != null) {
            // 树结构
            if (first instanceof TreeNode)
                // 遍历红黑树获取元素
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 非树结构，遍历链表获取元素
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

删除

/**
 * Removes the mapping for the specified key from this map if present.
 *
 * @param  key key whose mapping is to be removed from the map
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

依然是计算hash后调用removeNode()方法：

/**
 * Implements Map.remove and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to match if matchValue, else ignored
 * @param matchValue if true only remove if value is equal
 * @param movable if false do not move other nodes while removing
 * @return the node, or null if none
 */
final Node<K,V> removeNode(int hash, Object key, Object value,
                           boolean matchValue, boolean movable) {
    // 声明一些待使用变量
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    // map不为空 且 当前节点不为空
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        // hash和key相等，则元素为当前节点
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            node = p;
        // 否则，如果为链表或树结构
        else if ((e = p.next) != null) {
            // 如果为树结构
            if (p instanceof TreeNode)
                // 遍历并查找元素
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
            else {
                // 链表结构，遍历查找元素
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key ||
                         (key != null && key.equals(k)))) {
                        node = e;
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        // 删除元素
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) {
            if (node instanceof TreeNode)
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p)
                tab[index] = node.next;
            else
                p.next = node.next;
            ++modCount;
            --size;
            afterNodeRemoval(node);
            return node;
        }
    }
    return null;
}

常用常量

/**
* The default initial capacity - MUST be a power of two.
* 默认初始化容量 - 必须是2的幂
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* 最大容量，指定了更高的值或构造器传入了更高的参数时使用
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
* 默认负载因子
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin.  Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
* 链表树化阈值
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* 链表反树化阈值
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* 树化容量阈值
*/
static final int MIN_TREEIFY_CAPACITY = 64;