HashMap的一些笔记

最新推荐文章于 2024-09-29 11:59:16 发布

0000Joker0000

最新推荐文章于 2024-09-29 11:59:16 发布

阅读量147

点赞数

分类专栏：基础知识文章标签： hashmap

本文链接：https://blog.csdn.net/Yihchu/article/details/108150812

版权

基础知识专栏收录该内容

2 篇文章 0 订阅

订阅专栏

我的主页: https://yihchu.github.io/

概念

元素(entry): 具体存储数据的东西
桶(bucket): 存放元素的容器，可以是单个元素，可以是一个链表或者树
容量(capacity): 可以放多少个桶
负载因子(Load factor): 当元素的个数大于容量*负载因子(默认.75)，那么需要扩容
树化因子(TREEIFY_THRESHOLD): 当一个桶中的元素超过这个阈值时，会把链表转为红黑树；反过来还有一个反树化因子(UNTREEIFY_THRESHOLD)，当桶中的元素少于这个阈值时，会把树转为链表

以上，其实应该没有中文名，我这为了叫着方便~ 如果不合适，请忍耐~ 毕竟中文才是最博大精深的语言~ 没有之一，不接受反驳~

方法

构造函数

HashMap一共提供了三个构造函数，区别主要在于是否要设定容量和阈值～
源码如下：

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) { // 1
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity); // 2
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) { // 3
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() { // 4
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
    
    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) { // 5
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

说明：

带有两个参数的构造函数，可以指定容量和负载因子。
这个函数的作用是把容量修改到大于等于容量的最小的2的幂，见5
单参数构造函数，传入容量，使用默认的负载因子(.75)
无参构造函数，容量和负载因子均使用默认值(16, .75)
找到大于等于容量的最小的2的幂，为啥要把容量设定为2的幂数呢，见下文~

hash

先来看一下源码～

    /**
     * Computes key.hashCode() and spreads (XORs) higher bits of hash
     * to lower.  Because the table uses power-of-two masking, sets of
     * hashes that vary only in bits above the current mask will
     * always collide. (Among known examples are sets of Float keys
     * holding consecutive whole numbers in small tables.)  So we
     * apply a transform that spreads the impact of higher bits
     * downward. There is a tradeoff between speed, utility, and
     * quality of bit-spreading. Because many common sets of hashes
     * are already reasonably distributed (so don't benefit from
     * spreading), and because we use trees to handle large sets of
     * collisions in bins, we just XOR some shifted bits in the
     * cheapest possible way to reduce systematic lossage, as well as
     * to incorporate impact of the highest bits that would otherwise
     * never be used in index calculations because of table bounds.
     */
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }

hashCode()返回一个int型数据，然后这里是将这个hashCode的高16位和低16位进行了一个异或运算～
据说，这是一个可以在一定程度上减少冲突，并且运算简单的好方法～又便宜又好吃～
在下面，我们会看见index的计算方式: (n-1) & hash。
n是当前的容量，那么如果当这个n比较小的时候，相当于只有hashCode低位参与了运算，而通过现在的hash函数，相当于把高位也引入到了计算当中～

put

重头戏来了，这应该是我们最关心的方法，还是先上代码～

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) { // 1
        return putVal(hash(key), key, value, false, true);
    }
    /**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0) // 2
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null) // 3
            tab[i] = newNode(hash, key, value, null);
        else { // 4
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k)))) // 5
                e = p;
            else if (p instanceof TreeNode) // 6
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else { // 7
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash); // 8
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key // 9
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold) // 10 
            resize(); 
        afterNodeInsertion(evict);
        return null;
    }

说明：

put方法，先计算key的hash值，然后调用putVal方法。
初始化table，在构造函数中，仅仅是指定了容量和负载因子，并没有初始化容器。
这里是，如果对应的索引位置没有元素，那么直接把新元素放到这里。同时可以看到，计算索引的方式是(n - 1) & hash，而不是取余之类的。
如果对应的索引位置，已经有元素了，那么执行下面的逻辑。
如果索引位置的当前元素p的key和新元素一样，那么用e指向这个元素(也就是我们要put进去的key，其实已经存在了)。
当前元素p的key和新元素不一样，同时p又是一个树节点，那么就要调用putTreeVal方法；同时，如果这棵树中，已经包含了新元素的key，那么用e指向他。
同上，不过这个不是树，而是链表的情况。
如果，链表的长度超过树化阈值(TREEIFY_THRESHOLD)，那么就要把链表转化为红黑树
这里检查一下e，他表示的意思是，是否map中已经存在key对应的元素了。这里要根据onlyIfAbsent参数来决定是否用新元素替换老元素。
10.如果元素的数量超过了阈值(threshold)，那么需要进行扩容(resize)。

这里需要注意的是，我们在双参数版本的构造函数中将threshold赋值为容量了，但是在这里的threshold其实是容量*负载因子，这是咋回事呢？
table的初始化是在第一次进行put操作时，调用resize方法完成的，而这个threshold的值，也是在resize方法中被重新赋值的。
详见下文～

resize

resize这个方法，有两个作用～
第一个就是初始化table时需要，另一个就是扩容的时候需要～
每次扩容后的容量都会变成之前的二倍～
扩容实际上是创建了一个新的table，然后还需要把数据迁移过来～
源码如下：

    /**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) { // 1
            if (oldCap >= MAXIMUM_CAPACITY) { // 2
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY) // 3
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold // 4
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults // 5
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) { // 6
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr; // 7
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // 8
        table = newTab;
        if (oldTab != null) { 
            for (int j = 0; j < oldCap; ++j) { // 9
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)  // 10
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode) // 11
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); 
                    else { // preserve order // 12
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

说明：

这里是处理正常扩容的情况(非初始化)。
如果容量已经超过最大容量(MAXIMUM_CAPACITY)了，那就不能扩容了，并把扩容的阈值(threshold)设置为最大，即无法再触发resize方法。
正常扩容成原来的二倍，包括容量和阈值(threshold)，因为这个最大容量值(MAXIMUM_CAPACITY)为1 << 30，所以这里并不会溢出。
这里是处理使用带参构造函数构造的情况。
这里是处理使用无参构造函数构造的情况，均使用默认值。
计算阈值(threshold)。
这里就是修改阈值(threshold)的地方，在构造函数中阈值等于容量，而在这里进行了重新赋值，赋值为容量*负载因子。
创建一个新的扩容后的table。
将数据从老table中转移到新table。
10.老table中对应位置的元素是普通元素，那么直接把它挪过去就好了。
11.如果原来的元素是一棵树，那么需要把树拆为两部分，原来索引的部分，和新索引的部分，如拆分后树的元素过少，还需要将树转为链表～
12.如果原来的元素是一个列表，那么也需要将他拆分为两部分，分别放在原来的索引，和新索引处。

补充：关于转移数据时候的索引：
因为HashMap的容量，永远是2的幂数，而扩容也是扩容到原来的2倍，那么从二进制上来讲，加入原来的容量是2，那么就是00000010，扩容后就变成了4，也就是00000100。
那么，假如以前有两个key，分别是1，5，也就是00000001和00000101，他们两个的索引都是1，而扩容后，他们的索引分别是1和5。
其实，再举几个例子也是一样的，从老table到新table迁移数据的时候，只有两种情况，还在原索引位置，以及原索引位置+原容量。
仔细想想是不是这么一回事，因为扩容后，就是容量左移一位而已，再计算索引的时候，差别也仅仅是看新容量的高位与元素的高位进行与运算是否为1，差的正好也是原来的容量的值而已。
上面代码中，树也好，链表也好，其实都是根据这个原理来计算索引和迁移元素的。
(这也是在初始化时，会自动把容量转换为大于等于我们想要的容量的最小的2的幂所带来的一个很大的好处~)

get

get方法就很简单了，简单说就是计算hash值，然后去table里去取值～
源码如下：

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value; // 1
    }

    /**
     * Implements Map.get and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node // 2
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode) // 3
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do { // 4
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

说明：

计算key的hash，然后调用getNode方法。
计算索引并判断对应位置是否有元素，有的话那第一个元素(first)是不是我们要拿的元素(比较key)。
如果第一个元素(first)不对的话，分两种情况，如果第一个元素是树节点，那么就调用getTreeNode方法。
接上面，如果不是树节点，那就是链表，遍历链表找我们要的元素～

关于多线程

HashMap不是线程安全的，所以在多线程下使用就会出现一些问题～比如死循环，比如元素丢失问题等～
(人家本来就不是线程安全的，非用它干啥呀？生活本该简简单单～简直就是没事找事～)

死循环

这个问题好像是一个很经典的问题～不过仅限于1.8之前的版本～
1.8版本之后，把头插改为了尾插，就把这个问题解决了，但是，好像这个问题太经典了，老是出现在各个地方，无处可逃哇～

原因：
1.7版本的相关代码如下：

void transfer(Entry[] newTable) {
    Entry[] src = table;
    int newCapacity = newTable.length;
    for (int j = 0; j < src.length; j++) {
        Entry e = src[j];
        if (e != null) {
            src[j] = null;
            do {
                Entry next = e.next;
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while (e != null);
        }
    }
}

transfer方法是在resize后调用的，作用是迁移数据~
我们可以看他这个链表部分，是在遍历过程中，给倒了个顺序(头插)~
所以，假设本来table[1]处的链表为A -> B -> C，那么正常resize后，会变成C -> B -> A(假设恰好没有元素的索引发生变化)。
但是在多线程的情况下，可能会同时触发resize方法~
那么假设有两个线程：Thread1和Thread2~
当Thread1执行到table[1] : A -> null，准备处理B结点时，没有抢占到CPU。
而这时Thread2抢到了，开始执行，并一路跑到了table[1] : B -> A -> null，然后失去CPU。
这时Thread1继续执行，当他执行到table[1] : B -> A -> null后，需要继续处理下一个结点，但是这个下一个结点本来是C，但是却变成了A。
死循环就产生了~

注：可能有人会说，线程有自己的工作内存，一个线程修改了数据，对另一个线程来说应该是不可见的~
这里，我只能说，如果有volatile，synchronized这些关键字，那么线程之间必然可见~
但是即使没有这些关键字，也不能说就没有可见性，该发生的还是会发生，不信可以自己做个小实验验证一下~
好像有这么一句话：任何在多线程下可能发生的错误场景最终一定会发生。

而在1.8以及之后的版本中，不管是处理红黑树，还是链表，都保证原来的顺序~ 就避免了这个问题~
源码如下：

    final Node<K,V>[] resize() {
        ......
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // 1
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do { // 2
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
        ......
    }

说明：

处理树的过程，见下面代码~
链表的处理过程，因为迁移时索引只有两种情况，还是原索引不变(低位)，原索引+原容量(高位)，所以代码中搞了两个链表，分别对应低位和高位，并保证了原来的顺序不变~

        /**
         * Splits nodes in a tree bin into lower and upper tree bins,
         * or untreeifies if now too small. Called only from resize;
         * see above discussion about split bits and indices.
         *
         * @param map the map
         * @param tab the table for recording bin heads
         * @param index the index of the table being split
         * @param bit the bit of hash to split on
         */
        final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
            TreeNode<K,V> b = this;
            // Relink into lo and hi lists, preserving order
            TreeNode<K,V> loHead = null, loTail = null;
            TreeNode<K,V> hiHead = null, hiTail = null;
            int lc = 0, hc = 0;
            for (TreeNode<K,V> e = b, next; e != null; e = next) {
                next = (TreeNode<K,V>)e.next;
                e.next = null;
                if ((e.hash & bit) == 0) {
                    if ((e.prev = loTail) == null)
                        loHead = e;
                    else
                        loTail.next = e;
                    loTail = e;
                    ++lc;
                }
                else {
                    if ((e.prev = hiTail) == null)
                        hiHead = e;
                    else
                        hiTail.next = e;
                    hiTail = e;
                    ++hc;
                }
            }

            if (loHead != null) {
                if (lc <= UNTREEIFY_THRESHOLD)
                    tab[index] = loHead.untreeify(map);
                else {
                    tab[index] = loHead;
                    if (hiHead != null) // (else is already treeified)
                        loHead.treeify(tab);
                }
            }
            if (hiHead != null) {
                if (hc <= UNTREEIFY_THRESHOLD)
                    tab[index + bit] = hiHead.untreeify(map);
                else {
                    tab[index + bit] = hiHead;
                    if (loHead != null)
                        hiHead.treeify(tab);
                }
            }
        }

说明：
其实跟链表大同小异，核心就是把之前一个位置的元素按照索引分成了两个部分，并保证了顺序~

丢失元素

至于丢失元素这个问题，就很简单了，HashMap并没有去解决这个问题，毕竟本身就不是线程安全的~
简单提一嘴就是，在putVal方法中(见下面代码)~
假设两个线程同时执行到1处，然后第一个线程执行完之后，第二个线程得到了CPU开始执行，那么tab[i]的值就是第二个线程的元素了~
第一个线程的元素被覆盖掉了~
如果要解决这个问题，请使用ConcurrentHashMap~

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        ......
        if ((p = tab[i = (n - 1) & hash]) == null) 
            tab[i] = newNode(hash, key, value, null); // 1
        ......
    }