Java集合类HashMap实现原理

最新推荐文章于 2022-04-26 18:17:50 发布

武林飛

最新推荐文章于 2022-04-26 18:17:50 发布

阅读量382

点赞数 1

分类专栏： Java 文章标签： hashmap java

本文链接：https://blog.csdn.net/u010810431/article/details/72594295

版权

Java 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

HashMap的一些特性：

1.能存储键值对；

2.接收null键或null值；

3.无需定义其长度，自动扩容，通过put方法存储键值对，通过get()获得key对应的值

4.存储是无序的，即打印出所有键值对时无法按照放入的顺序打印出来；

5.key不能重复，重复的key后入的替换前入的键值对。

这些特性是由其数据结构和内部处理逻辑决定的，通过研读其源码，我们可以深刻理解HashMap这些特性。推荐使用idea开发平台，其中一个特性就是读源码非常棒，还有其他很多的优点，这里不做介绍。

我们直接看看HashMap的put()函数是如何工作的：

   /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

首先有个判断if (table == EMPTY_TABLE)，table是什么呢，找到其定义的地方，我们发现这句源码：

/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

他是一个Entry<K,V>类型的数组，初始值是一个空的数组，那么Entry<K,V>又是什么呢：

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
}

我们目前还是只着重核心的部分，Entry 是一个 static class，其中包含了 key 和 value，也就是键值对，另外还包含了一个 next 的 Entry 指针。我们可以总结出：Entry 就是数组中的元素，每个 Entry 其实就是一个 key-value 对，它持有一个指向下一个元素的引用，这就构成了链表。

put()方法里面先判断table是否为空，空的话先定义好这个数组的长度，数组长度默认大小是2的4次方，这个长度我们我可以在new HashMap() 的时候传入一个我们需要的长度，但这个长度必须是2的次幂，这是因为这样的长度的数组能最大限度的被使用，减少碰撞概率（具体看附录1）。

put()接着第二个if(key==null){return putForNullKey(value);}//这里面就是处理null key，他是将null的key放在table[0]了。

接着两个变量int hash = hash(key);int i = indexFor(hash, table.length);i便是这个键值对要放在table数组的位置，他是通过key的hash值与table.length-1做了个与运算，这个运算保证i在数组的length范围内。接着是这样一段代码：

for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

这里是判断当前put的key与之前的key是否有相同的，有的话用put的value替换之前的value并返回oldvalue;

如果没有相同的可以变调用addEntry(hash, key, value, i); 插入新的entry;我们看看addEntry方法是怎么处理的：

/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

通过注释便可以知道，他是会新增一个存储键值对的entry对象到buket里面（即table数组），同时在这里可能会对table数组进行扩容；首先的判断就是做扩容操作的，size是当前table的entry的个数，threshold是table的阈值（capacity * load factor），数组初始长度（默认16）*一个阈值（默认0.75），这样做是为了减少链表长度，因为链表的操作涉及循环，较耗时，理想状态下一个table位置放一个entry，entry的next为空最好，但事实上不同的key通过int hash = hash(key);int i = indexFor(hash, table.length);这个计算出来的i会出现相同的情况即发生了碰撞，这时候就需要通过链表来存储同一位置的键值对，而entry结构中的Entry next变量就可以处理这个问题；

/**
     * Like addEntry except that this version is used when creating entries
     * as part of Map construction or "pseudo-construction" (cloning,
     * deserialization).  This version needn't worry about resizing the table.
     *
     * Subclass overrides this to alter the behavior of HashMap(Map),
     * clone, and readObject.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

从createEntry()方法的代码就可以看出，它把table的bucketIndex位置上的entry放到了新的entr的next中。

至此，我们对HashMap的存储有了清晰的了解。

我们再来看看get()方法是如何取数据的：

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    /**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

    /**
     * Returns the entry associated with the specified key in the
     * HashMap.  Returns null if the HashMap contains no mapping
     * for the key.
     */
    final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

很简单，key为空就获取table[0]的entry的value，否则再通过存储时计算table下标一样的方法nt hash = hash(key);int i = indexFor(hash, table.length);来计算下标，获取下标的entry，并循环里面的链表，获取key相同的对应的value，由此便能通过key找出value。

读完源码，感觉HashMap的结构其实也很简单，就是通过一个Entry数组来存储，entry里面能存储键值对，而entry在数组中的位置是由key的hash值在对table.length-1的与运算获得的，因此存储完也还能快速的找到其位置，而这里面有个容量的设置，当table里面的entry个数达到一定的阈值后便会自动扩容，容量是原有容量的2倍，扩容会new一个table，同时将原来table的entry复制到新table，这是很消耗性能的操作，所以在可以预见键值对个数的情况下可以在初始化HashMap的时候给它指定table的容量，避免扩容操作。

附录1(摘抄自极客学院)

当 length 总是 2 的 n 次方时，h& (length-1)运算等价于对 length 取模，也就是 h%length，但是 & 比 % 具有更高的效率。这看上去很简单，其实比较有玄机的，我们举个例子来说明：

假设数组长度分别为 15 和 16，优化后的 hash 码分别为 8 和 9，那么 & 运算后的结果如下：

h & (table.length-1)	hash		table.length-1
8 & (15-1)：	0100	&	1110	= 0100
9 & (15-1)：	0101	&	1110	= 0100
8 & (16-1)：	0100	&	1111	= 0100
9 & (16-1)：	0101	&	1111	= 0101

从上面的例子中可以看出：当它们和 15-1（1110）“与”的时候，产生了相同的结果，也就是说它们会定位到数组中的同一个位置上去，这就产生了碰撞，8 和 9 会被放到数组中的同一个位置上形成链表，那么查询的时候就需要遍历这个链表，得到8或者9，这样就降低了查询的效率。同时，我们也可以发现，当数组长度为 15 的时候，hash 值会与 15-1（1110）进行“与”，那么最后一位永远是 0，而 0001，0011，0101，1001，1011，0111，1101 这几个位置永远都不能存放元素了，空间浪费相当大，更糟的是这种情况中，数组可以使用的位置比数组长度小了很多，这意味着进一步增加了碰撞的几率，减慢了查询的效率！而当数组长度为16时，即为2的n次方时，2n-1 得到的二进制数的每个位上的值都为 1，这使得在低位上&时，得到的和原 hash 的低位相同，加之 hash(int h)方法对 key 的 hashCode 的进一步优化，加入了高位计算，就使得只有相同的 hash 值的两个值才会被放到数组中的同一个位置上形成链表。

所以说，当数组长度为 2 的 n 次幂的时候，不同的 key 算得得 index 相同的几率较小，那么数据在数组上分布就比较均匀，也就是说碰撞的几率小，相对的，查询的时候就不用遍历某个位置上的链表，这样查询效率也就较高了。

附录2（摘抄自极客学院）

HashMap 的两种遍历方式

第一种

　　Map map = new HashMap();
　　Iterator iter = map.entrySet().iterator();
　　while (iter.hasNext()) {
　　Map.Entry entry = (Map.Entry) iter.next();
　　Object key = entry.getKey();
　　Object val = entry.getValue();
　　}

效率高,以后一定要使用此种方式！

第二种

　　Map map = new HashMap();
　　Iterator iter = map.keySet().iterator();
　　while (iter.hasNext()) {
　　Object key = iter.next();
　　Object val = map.get(key);
　　}

效率低,以后尽量少使用！

武林飛

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Java集合类HashMap实现原理

HashMap的一些特性：1.能存储键值对；2.接收null键或null值；3.无需定义其长度，自动扩容，通过put方法存储键值对，通过get()获得key对应的值4.存储是无序的，即打印出所有键值对时无法按照放入的顺序打印出来；5.key不能重复，重复的key后入的替换前入的键值对。这些特性是由其数据结构和内部处理逻辑决定的，通过研读其源码，我们可以深刻理解HashMap
复制链接

扫一扫

专栏目录