【源码笔记】HashMap（2）- put、get和remove方法

最新推荐文章于 2024-02-03 15:53:38 发布

安东你的下巴呢

最新推荐文章于 2024-02-03 15:53:38 发布

阅读量425

点赞数

分类专栏：源码笔记文章标签：哈希算法散列表数据结构

本文链接：https://blog.csdn.net/weixin_43686024/article/details/125903182

版权

源码笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

HashMap（2）- put、get和remove方法

1. put

1.1 put方法的定义

put方法向这个map中增加指定的键值对，如果map中已经包含了这个键，就替换掉旧的值，并且返回这个值。如果map中不包含这个键，就返回null。

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

1.2 putVal方法——put操作的具体实现

很明显，put操作的具体实现都在putVal这个方法里。

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0) //判断table是否为空或者长度为0
        n = (tab = resize()).length; //调用resize方法初始化
    if ((p = tab[i = (n - 1) & hash]) == null) //计算key的哈希值对应的桶数组下标
        tab[i] = newNode(hash, key, value, null); //如果数组此项为null,新建Node对象并赋值
    else { //如果数组此项不为null
        Node<K,V> e; K k;
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p; //如果要put的key和第一个节点的key哈希值相同而且两个key本身也相同，即key已存在
        else if (p instanceof TreeNode) //如果节点是TreeNode类型
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else { //其他情况，也就是节点是链表类型
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) { //p的下一个节点是null，即遍历到链表末尾
                    p.next = newNode(hash, key, value, null); //创建一个节点，添加到链表里
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st（原注释）
                        treeifyBin(tab, hash); //如果链表长度大于等于TREEIFY_THRESHOLD - 1，转换为树
                    break; //结束循环
                }
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break; //如果要put的key和某个节点的key哈希值相同而且两个key本身也相同，即key已存在
                p = e; //p赋值为p的下一个节点，继续循环
            }
        }
        if (e != null) { // existing mapping for key （原注释，也就是key已存在的情况）
            V oldValue = e.value; //定义原节点的value为旧值
            if (!onlyIfAbsent || oldValue == null) //被put方法调用时onlyIfAbsent为false，因此肯定为true
                e.value = value; //将源节点的value改为新的value
            afterNodeAccess(e); //提供给node节点被访问后的回调方法
            return oldValue; //返回修改之前的值
        }
    }
    ++modCount; //修改HashMap结构次数的计数器加1
    if (++size > threshold) //map中键值对的数量自增
        resize(); //如果自增后的键值对数量大于阈值时，调用resize方法
    afterNodeInsertion(evict); //提供给node节点增加后的回调方法
    return null; //返回null
}

先判断了map的table是否为null或者长度为0，如果是，就调用resize()方法进行初始化。这里的table是一个Node类型的数组，也就是map的桶的数组。它在第一次用到的时候初始化，在有必要的时候会重新调整大小，长度永远是2的幂。

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

然后判断table的第(n - 1) & hash项的值是否为null。这里的n是table的长度，n - 1也就是table数组的最大下标，&(与)运算表示将数值转化为二进制后，每位分别进行计算，当两个数都为1时，结果为1，否则为0。举个例子，假设此处的n为16（也就是DEFAULT_INITIAL_CAPACITY，默认的初始容量），hash值为37，n - 1转为二进制为1111，37转为二进制为100101，那么1111 & 100101结果就是0101，也就是5。

可以看出这里的运算是为了将hash值转化为在table数组下标范围内的值，那么为什么要使用&(与)运算而不是更常见的%(取模)运算呢？原因是与运算的效率要远远大于取模运算，而且因为规定了table的长度n永远是2的幂，那么(n - 1) & hash的值和hash % n的值必定是相等的，这样实现是使用了效率更高的方式计算出了相同的结果。

1.3 hash方法——避免哈希碰撞

那么hash又是是如何计算出来的呢？来看hash(Object key)方法。

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

先调用了key的hashCode()方法获取int类型的哈希值，然后将哈希值>>>(无符号右移)16位，再和原哈希值做^(抑或)运算。因为int类型的哈希值是32位的，而无符号右移运算在左侧补0，相当于是对哈希值的高16位和低16位做抑或运算。这么做是由于table的大小永远是2的幂，如果table的大小足够小，就相当于哈希值只有低位参与了运算，导致发生哈希碰撞的可能性变大。举个例子，假设table的大小为16，有两个哈希值

a = 0000 0000 0000 0000 0000 0000 0000 0001
b = 0000 0000 0000 0001 0000 0000 0000 0001

如果不进行高16位和低16位抑或运算，(n - 1) & hash的结果是

1111 & 0000 0000 0000 0000 0000 0000 0000 0001 = 0001
1111 & 0000 0000 0000 0001 0000 0000 0000 0001 = 0001

发生了哈希碰撞，而进行抑或运算，哈希值变为

a = 0000 0000 0000 0000 0000 0000 0000 0001
b = 0000 0000 0000 0001 0000 0000 0000 0000

(n - 1) & hash的结果是

1111 & 0000 0000 0000 0000 0000 0000 0000 0001 = 0001
1111 & 0000 0000 0000 0001 0000 0000 0000 0000 = 0000

成功避免了哈希碰撞。

1.4 回到putVal

继续再看putVal()方法，根据key的哈希值计算出对应的桶数组下标，并判断桶数组的这一项是否为null，如果是，就调用newNode方法创建Node对象，并赋值给桶数组的这一项。而如果不是null，先定义一个Node类型的变量e，然后进一步做了判断：先检查第一个节点是否相同，相同就把这个节点的值赋给e；然后看节点是否为TreeNode类型，如果是，调用节点的putTreeVal方法，将返回值赋给e;如果不是，就遍历这个节点链表，查找是否有相同的key，找到了就赋值给e，如果找不到（下一个节点为null），就把要put的键值对添加到链表里。之后又做了一个判断，调用了treeifyBin方法。

看一下putTreeVal方法

/**
 * Tree version of putVal.
 */
final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab, int h, K k, V v) {
    Class<?> kc = null;
    boolean searched = false;
    TreeNode<K,V> root = (parent != null) ? root() : this;
    for (TreeNode<K,V> p = root;;) {
        int dir, ph; K pk;
        if ((ph = p.hash) > h)
            dir = -1;
        else if (ph < h)
            dir = 1;
        else if ((pk = p.key) == k || (k != null && k.equals(pk)))
            return p;
        else if ((kc == null &&
                  (kc = comparableClassFor(k)) == null) ||
                 (dir = compareComparables(kc, k, pk)) == 0) {
            if (!searched) {
                TreeNode<K,V> q, ch;
                searched = true;
                if (((ch = p.left) != null &&
                     (q = ch.find(h, k, kc)) != null) ||
                    ((ch = p.right) != null &&
                     (q = ch.find(h, k, kc)) != null))
                    return q;
            }
            dir = tieBreakOrder(k, pk);
        }

        TreeNode<K,V> xp = p;
        if ((p = (dir <= 0) ? p.left : p.right) == null) {
            Node<K,V> xpn = xp.next;
            TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
            if (dir <= 0)
                xp.left = x;
            else
                xp.right = x;
            xp.next = x;
            x.parent = x.prev = xp;
            if (xpn != null)
                ((TreeNode<K,V>)xpn).prev = x;
            moveRootToFront(tab, balanceInsertion(root, x));
            return null;
        }
    }
}

这个方法不必逐行仔细看，重点关注三个return语句，前面两个都是在检索红黑树的过程中，查找/判断到了有和要put的key相同的节点，就直接返回了这个节点，第三个是在检索到最后也没找到的情况，是将键值对添加到了红黑树中，并返回了null。

可以这样理解，三个分支其实做了相同的事情，就是查找/判断要put的key是否已经存在，如果存在就将已存在的节点赋值给e做后续操作，如果不存在就将值添加到这个桶里，e的值保持为null。分支结束后，接下来判断了如果e不是null（也就是查找到了有相同的key的情况），就将要put的value值赋给e的value,并返回旧的value。

到这里，就完成了put操作。值得注意的是，在向链表增加了键值对之后，检查了链表的长度，并在其大于等于TREEIFY_THRESHOLD - 1（也就是7，因为前面对第一个节点单独进行了判断，这里对链表的遍历是从第二个节点开始的，所以链表的总长度应该是8）时，调用treeifyBin(tab, hash)将链表转换为了红黑树。此外，当map新增了键值对，数量超过阈值（容量 * 负载因子）时，对map的桶数组进行了resize()操作。resize()方法的注释是这样写的

/**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     * 
     * 初始化或者把table的大小翻倍。如果是null，根据字段threshold中的初始容量值进行分配。
     * 否则，由于table的大小是2的幂，每个桶里的元素会保留在原来的位置，或者在新的表里以2的幂作为偏移量移动
     * @return the table
     */
    final Node<K,V>[] resize() {
    	//...
    }

resize()的具体实现之后单独写一篇来分析，现在明确一点即可：HashMap在桶的数量超过（容量 * 负载因子）时就会进行resize操作，将桶的数量翻倍，元素要么保留在原来的位置，要么以2的幂作为偏移量移动到新的位置。

2. get

2.1 get方法的定义

get方法返回指定的键对应的值，或者当没有这个键时返回null。更正式的来说，如果map中包含像是(key==null ? k==null :key.equals(k)) 这样的键为k,值为v的映射，那么这个方法返回v，否则返回null。（最多只会有一组这样的映射）。而返回null并不能说明map中不包含这个键的映射，因为有可能这个键映射的值就是null本身。（废话）

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

2.2 getNode方法——get操作的具体实现

很明显，get的具体实现是在getNode方法里

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    //判断table是否为null，大小是否为空，以及hash值对应的桶是否为null
    if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first; //检查第一个节点是否位要找的值，如果是就返回
        if ((e = first.next) != null) { 
            if (first instanceof TreeNode) //如果是红黑树
                return ((TreeNode<K,V>)first).getTreeNode(hash, key); //在树中检索，并返回结果
            do { //否则，也就node是链表的情况
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;//查找到了，赋值，结束循环
            } while ((e = e.next) != null);
        }
    }
    //没有找到要查找的节点，返回null
    return null; 
}

先检查了桶数组是否为null、大小是否为0，然后用数组下标的计算方式(n - 1) & hash获取到对应的桶，并判断是否为null。也就是判断map中是否包含要查找的键，如果不包含就直接返回null。如果包含的话就先检查第一个节点是否就是要查找的键，如果是就直接返回第一个节点，否则再按照链表和红黑树的不同方式遍历查找。找到就返回，找不到就最后返回null。

2.3 为什么永远要检查第一个节点

注意这里源码中的注释：always check first node，永远要检查第一个节点。回想HashMap的实现内容，比如优化过的哈希值计算方法，扩容操作等都是在尽力避免哈希碰撞的发生，而无论节点的类型是链表（超过一个节点）还是红黑树，都是用来处理哈希碰撞的结构，是不希望出现的。理想的情况应该是尽可能不发生哈希碰撞，桶数组的每个桶里都只有一个Node，所以这里要always check first node。而在putVal和后面将要提到的removeNode方法的实现里，虽然没有像这里这样明确的注释，但实际上也是先对第一个节点做了检查，可见先检查第一个节点的重要性。

3.remove

3.1 remove方法的定义

remove方法：如果存在，就从map中移除指定键的映射。返回指定的键映射的值，如果没有映射关系，就返回null。和get方法一样，返回null并不能说明map中不包含这个键的映射，因为有可能这个键映射的值就是null本身。

/**
 * Removes the mapping for the specified key from this map if present.
 *
 * @param  key key whose mapping is to be removed from the map
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

3.2 removeNode方法——remove操作的具体实现

final Node<K,V> removeNode(int hash, Object key, Object value, boolean matchValue, boolean movable) {
    Node<K,V>[] tab; Node<K,V> p; int n, index;
    //判断table是否为null，大小是否为空，以及hash值对应的桶是否为null
    if ((tab = table) != null && (n = tab.length) > 0 && (p = tab[index = (n - 1) & hash]) != null) {
        Node<K,V> node = null, e; K k; V v;
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            node = p; //检查第一个节点是否为要找的值
        else if ((e = p.next) != null) {
            if (p instanceof TreeNode) //如果是红黑树
                node = ((TreeNode<K,V>)p).getTreeNode(hash, key); //在树中检索 
            else { //否则，也就node是链表的情况
                do {
                    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) {
                        node = e; //查找到了，赋值，结束循环
                        break;
                    }
                    p = e;
                } while ((e = e.next) != null);
            }
        }
        //判断是否查到了要移除的节点
        if (node != null && (!matchValue || (v = node.value) == value ||
                             (value != null && value.equals(v)))) { //remove调用时!matchValue恒为true
            if (node instanceof TreeNode) //如果是TreeNode，用removeTreeNode移除
                ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
            else if (node == p) //如果是第一个节点就是要移除的节点
                tab[index] = node.next; //把node的下一个节点赋值给桶数组
            else
                p.next = node.next; //如果是链表，把node的上一个节点链接到它的下一个节点，把node剔除出去
            ++modCount;//修改HashMap结构次数的计数器加1
            --size; //map中键值对的数量自减
            afterNodeRemoval(node); //提供给node节点移除后的回调方法
            return node; //返回被移除的节点
        }
    }
    //没有需要移除的节点，返回null
    return null;
}

与get操作相同，先检查了桶数组是否为null、大小是否为0，然后用数组下标的计算方式(n - 1) & hash获取到对应的桶，并判断是否为null。也就是判断map中是否包含要查找的键，如果不包含就直接返回null。如果包含，接下来定义了Node类型的变量node，然后按照getNode方法中的查找方式，先检查第一个节点是否为想移除的，不是的话再按照链表和红黑树的不同方式遍历查找，找到就赋值给node，否则node就保持为null。查找完成后，先检查node是否为null，如果是，就直接返回null。如果不是，就按照node的类型分别移除节点：如果是TreeNode，调用removeTreeNode方法移除节点；如果是第一个节点，就把桶数组的这一项改为这个节点的下一个节点；如果是链表，就把这个节点的上个节点链接到这个节点的下个节点，把它从链表中移除。完成移除后，增加修改HashMap结构的计数器，自减键值对的数量，最后返回这个节点本身。