HashMap工作原理

最新推荐文章于 2022-04-11 15:29:55 发布

jasper_t

最新推荐文章于 2022-04-11 15:29:55 发布

阅读量246

点赞数

分类专栏： java android 文章标签： hashmap 数据

本文链接：https://blog.csdn.net/u014411863/article/details/51546860

版权

android 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

java

1 篇文章 0 订阅

订阅专栏

HashMap工作原理

大家都知道，链表容易进行增删操作而查询效率低，数组易查询单增删效率低；这样就诞生了我们的开发过程中经常用的HashMap，它结合了数组和链表的优点，查询和增删操作上都有着不错的效率。在对HashMap的理解上，我们可以把它看成是数据和链表的结合体，实际上它也是哈，其中我们最常用的就是它对应的get和put方法，下面我们就来探讨一下这两个方法原理。

这里写图片描述

get（）方法

我们都知道HashMap是通过K-V进行存取数据的，首先我们可以想象有一个大小已知的数组，这个table中存放是HashMapEntry对象，然后每个这样HashMapEntry对象e又可能存在e.next（构成链表）不空的情况，这样的我们就在某个具体位置上的链表上进行遍历去找key对应的value。（其实我们可以设想一下这种极端的情况，假如说hash后都对应着同一个位置，这样的话不就变成一个链表了么，可惜实际上HashMap并不会这样）OK，直接撸源码吧，看一下究竟是怎么写的：

    /**
     * Returns the value of the mapping with the specified key.
     *
     * @param key
     *            the key.
     * @return the value of the mapping with the specified key, or {@code null}
     *         if no mapping for the specified key is found.
     */
    public V get(Object key) {
        if (key == null) {
            HashMapEntry<K, V> e = entryForNullKey;
            return e == null ? null : e.value;
        }

        int hash = Collections.secondaryHash(key);
        HashMapEntry<K, V>[] tab = table;
        for (HashMapEntry<K, V> e = tab[hash & (tab.length - 1)];
                e != null; e = e.next) {
            K eKey = e.key;
            if (eKey == key || (e.hash == hash && key.equals(eKey))) {
                return e.value;
            }
        }
        return null;
    }

从源码中，可以看到首先是对传入的key进行判空，然后通过Collections.secondaryHash(key)来得到该key对应的hash值，然后将这个hash table放在一个HashMapEntry类型的数组tab中，然后可以看到进行一个for循环，首先，根据hash & (tab.length - 1)找到tab数组中tab[hash & (tab.length - 1)]这个位置，这个过程也就是我们上面说的根据key的hashCode找到相应的位置了；在for的条件中有e = e.next这样的处理，也就说，这个for循环真正遍历的是tab数组中某个位置上的链表，而链表中的节点貌似又是HashMapEntry对象；进入for循环中，可以看到将链表中的e的key赋给eKey，然后对拿这个eKey去和get（）方法中传入的key进行比较，if条件中的意思是，要么eKey和key有相同的引用、要么他们的值是相同的，如果满足这两个条件的额任何一个就认为找到了key对应的value，就将HashMapEntry对象中的value返回，这时就找到了key对应的value。

如果当前的e中的key不符合if中的条件，这时就会继续执行for循环执行e = e.next，也就是遍历到下一个数组节点，继续去判断是否符合；OK，get（）方法就酱紫。

put（）方法

经过上面get（）方法的讲解，估计put（）方法在理解起来也比价简单，所以不啰嗦了，直接上源码：

    /**
     * Maps the specified key to the specified value.
     *
     * @param key
     *            the key.
     * @param value
     *            the value.
     * @return the value of any previous mapping with the specified key or
     *         {@code null} if there was no such mapping.
     */
    @Override public V put(K key, V value) {
        if (key == null) {
            return putValueForNullKey(value);
        }

        int hash = Collections.secondaryHash(key);
        HashMapEntry<K, V>[] tab = table;
        int index = hash & (tab.length - 1);
        for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
            if (e.hash == hash && key.equals(e.key)) {
                preModify(e);
                V oldValue = e.value;
                e.value = value;
                return oldValue;
            }
        }

        // No entry for (non-null) key is present; create one
        modCount++;
        if (size++ > threshold) {
            tab = doubleCapacity();
            index = hash & (tab.length - 1);
        }
        addNewEntry(key, value, hash, index);
        return null;
    }

可以看到它在存放value时候，也会使用同样的方法去把value放到key的hash相对应的位置，先找到tab[]上相应的位置，然后在该位置上的链表进行遍历，找到合适e.key并更新e.value。OK，存放的过程是比较简单和清楚的，但是其中涉及上一个非常重要的问题，那就是HashMap扩充容量的问题，接下来就来说一下其扩容的相关问题。

从源码中可以看到，if (size++ > threshold)这句，如果说当前的size大于threshold时就执行tab = doubleCapacity()这句进行扩容。先来看一下这个threshold，源码中给出的注释是：

 /**
     * The table is rehashed when its size exceeds this threshold.
     * The value of this field is generally .75 * capacity, except when
     * the capacity is zero, as described in the EMPTY_TABLE declaration
     * above.
     */
    private transient int threshold;

就是说如果table的size超过threshold这个值时，这个table就需要进行重新hash，threshold这个值时capacity * 0.75，而这个capacity是HashMap的初始大小，可以查到capacity是16（这里就不讨论为什么是16了），也就说上面的size大于12就需要对进行扩容了。

接下来就看一下扩容的操作doubleCapacity()这个函数：

/**
     * Doubles the capacity of the hash table. Existing entries are placed in
     * the correct bucket on the enlarged table. If the current capacity is,
     * MAXIMUM_CAPACITY, this method is a no-op. Returns the table, which
     * will be new unless we were already at MAXIMUM_CAPACITY.
     */
    private HashMapEntry<K, V>[] doubleCapacity() {
        HashMapEntry<K, V>[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            return oldTable;
        }
        int newCapacity = oldCapacity * 2;
        HashMapEntry<K, V>[] newTable = makeTable(newCapacity);
        if (size == 0) {
            return newTable;
        }

        for (int j = 0; j < oldCapacity; j++) {
            /*
             * Rehash the bucket using the minimum number of field writes.
             * This is the most subtle and delicate code in the class.
             */
            HashMapEntry<K, V> e = oldTable[j];
            if (e == null) {
                continue;
            }
            int highBit = e.hash & oldCapacity;
            HashMapEntry<K, V> broken = null;
            newTable[j | highBit] = e;
            for (HashMapEntry<K, V> n = e.next; n != null; e = n, n = n.next) {
                int nextHighBit = n.hash & oldCapacity;
                if (nextHighBit != highBit) {
                    if (broken == null)
                        newTable[j | nextHighBit] = n;
                    else
                        broken.next = n;
                    broken = e;
                    highBit = nextHighBit;
                }
            }
            if (broken != null)
                broken.next = null;
        }
        return newTable;
    }

可以重点看方法的中前一部分的代码，首先如果需要扩容的HashMap的capacity达到了MAXIMUM_CAPACITY，貌似就认为达到最大了不进行扩充了。然后int newCapacity = oldCapacity * 2来获得新的table的容量，然后在通过调用makeTable(newCapacity)方法来创建新的table，OK就酱紫。

HashMap在一些应用中为何需要重写hashCode和equals方法

在上面的get（Object key）方法中可以看到，这个key是Object类型的，也就是说们HashMap中的key类型不确定哦，但是我们需要保证一点，相同的key的hash值必须相同，不同的key的hash值必须要不同；为了保证这种需求，我们在有些自定义类作为key时，我们需要在这个类中去重写hashCode和equals方法，具体可以参考：http://blog.csdn.net/ranmudaofa/article/details/39483605中的例子。或许有人疑问，为啥String或其他的基本类型作为Key时不需要，因为在这些引用或则基本类型中java源码已经帮我们做了。

OK，HashMap的相关介绍已经结束，求轻喷~