JDK7中的HashMap

最新推荐文章于 2022-07-15 23:34:55 发布

风华正茂少

最新推荐文章于 2022-07-15 23:34:55 发布

阅读量304

点赞数

分类专栏： java开发文章标签： HashMap

本文链接：https://blog.csdn.net/syx1065001748/article/details/77565898

版权

java开发专栏收录该内容

11 篇文章 0 订阅

订阅专栏

首先研究它肯定是要看下HashMap的源码

一、初识HashMap

我们打开看见

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

如下图：

我们就可以看出这些数据结构的基本关系了。

    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16


    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;


    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;


    /**
     * An empty table instance to share when the table is not inflated.
     */
    static final Entry<?,?>[] EMPTY_TABLE = {};


    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;


    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;


    /**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.
    int threshold;

这里有几个参数，虽然在用的时候没有体现，但是它们都是使用过程中所必须的。

其中我们知道，我们存储的在map中的信息都是以键值对的形式存储的。也就是上边所看到的Entry<K,V>[]table;

DEFAULT_INITIAL_CAPACITY是初始化时容量大小，
当capacity * load factor,超过这个数就会进行再哈希，也就是resize操作。

当然，还有一个属性就是size,是map的大小。

容量：是哈希表中桶的数量，初始容量只是哈希表在创建时的容量，实际上就是Entry< K,V>[] table的容量

加载因子：是哈希表在其容量自动增加之前可以达到多满的一种尺度。它衡量的是一个散列表的空间的使用程度，负载因子越大表示散列表的装填程度越高，反之愈小。对于使用链表法的散列表来说，查找一个元素的平均时间是O(1+a)，因此如果负载因子越大，对空间的利用更充分，然而后果是查找效率的降低；如果负载因子太小，那么散列表的数据将过于稀疏，对空间造成严重浪费。系统默认负载因子为0.75，一般情况下我们是无需修改的。
　　当哈希表中的条目数超出了加载因子与当前容量的乘积时，则要对该哈希表进行 rehash 操作（即重建内部数据结构），从而哈希表将具有大约两倍的桶数。

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

我们简单的看下K,V结构，

其中有key,value,也是就是我们存储的键值对，

另外还有一个next，也就是在发生哈希冲突时的链表所使用的引用。

还有就是hash值。

二、构造函数

构造方法摘要
`HashMap()` 构造一个具有默认初始容量 (16) 和默认加载因子 (0.75) 的空 `HashMap`。
`HashMap(int initialCapacity)` 构造一个带指定初始容量和默认加载因子 (0.75) 的空 `HashMap`。
`HashMap(int initialCapacity, float loadFactor)` 构造一个带指定初始容量和加载因子的空 `HashMap`。
`HashMap(Map<? extendsK,? extendsV> m)` 构造一个映射关系与指定 `Map` 相同的 `HashMap`。

1.图解HashMap:

2.源码解析

1）put方法：

    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
            /**
             *  这里我们看到大概是做了一个初始化的工作。
             *  private void inflateTable(int toSize) {
        	 *      Find a power of 2 >= toSize
             *   	int capacity = roundUpToPowerOf2(toSize);
             *
             *      threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
             *      table = new Entry[capacity];
             *      initHashSeedAsNeeded(capacity);
    	     *		}
             */
        }
        if (key == null)
            return putForNullKey(value);
        	/**
         	*  判断key为null时的操作，
         	*  private V putForNullKey(V value) {
            *      for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            *      		//如果之前存在key为null,那么替换旧值，并返回旧值
            *      		if (e.key == null) {
            *       		V oldValue = e.value;
            *        		e.value = value;
            *         		e.recordAccess(this);
            *          		return oldValue;
            *           }
        	*	   }
        	*    	modCount++;
        	*    	addEntry(0, null, value, 0);
        	*    	return null;
    		*  }
         	* 
         	*/
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        判断当前确定的索引位置是否存在相同hashcode和相同key的元素，如果存在相同的hashcode和相同的key的元素，那么新值覆盖原来的旧值，并返回旧值。  
        //如果存在相同的hashcode，那么他们确定的索引位置就相同，这时判断他们的key是否相同，如果不相同，这时就是产生了hash冲突。  
        //Hash冲突后，那么HashMap的单个bucket里存储的不是一个 Entry，而是一个 Entry 链。  
        //系统只能必须按顺序遍历每个 Entry，直到找到想搜索的 Entry 为止——如果恰好要搜索的 Entry 位于该 Entry 链的最末端（该 Entry 是最早放入该 bucket 中），  
        //那系统必须循环到最后才能找到该元素。
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

putForNullKey可以看出HashMap为什么可以使用null作为键值，是因为在插入值前做了一个处理。

然后可以再看一下addEntry方法：

    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

其中indexFor也就是获取该hash值下table的下标。

特别的说，代码可以看出key==null的时候hash值是0;而且会插入table[0]的位置。

这里我们可以看到扩容的条件

 if ((size >= threshold) && (null != table[bucketIndex]))

然后就是Entry的构造函数了：

/**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            //这里看出，在插入hash值相同的Entry时，做了一个next=n;
            //之前我们看到n,其实是table[bucketIndex]也就是说n是hash值为h的Entry,更加是hash值为h的链表的头
            //所以之歌操作是的新的Entry变成这个链表的头，并指向原有的链表，如果链表为空，则next->null。
            key = k;
            hash = h;
        }

2）get方法：

get就比put简单些了。

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        /**
         * 这里肯定是获取key为null的地方。
         */
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    private V getForNullKey() {
        if (size == 0) {
            return null;
        }
        /**
         * 这里直接可以看到去table[0]里边去找信息。
         */

        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

再看一下getEntry()方法

final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

首先查找table的下表，然后在查找单链表中key相等的，并返回相应Entry.

3)hash值和table下标的计算

我们知道HashMap是数组和链表都有的情况，那么会不会有一种极端？

所有的数据都存在一个链表上？

那么这种情况很难发生，但是我也不能保证这种极端的情况不发生，那么HashMap是如何保证数据在这样一个数据结构中的存储是分布均匀的？

HashMap是通过&运算符（按位与操作）来实现的：h & (length-1)

/**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }

没错就是它。

这里我们假设length为16(2^n)和15，h为5、6、7。
　　这里写图片描述

　　
　　当length-1 = 14时，6和7的结果一样，这样表示他们在table存储的位置是相同的，也就是产生了碰撞，6、7就会在一个位置形成链表，这样就会导致查询速度降低详细地看看当length-1 = 14 时的情况：
　　这里写图片描述

　　可以看到，这样发生发生的碰撞是非常多的，1,3,5,7,9,11,13都没有存放数据，空间减少，进一步增加碰撞几率，这样就会导致查询速度慢，
　　分析一下：当length-1 = 14时，二进制的最后一位是0，在&操作时，一个为0，无论另一个为1还是0，最终&操作结果都是0，这就造成了结果的二进制的最后一位都是0，这就导致了所有数据都存储在2的倍数位上，所以说，所以说当length = 2^n时，不同的hash值发生碰撞的概率比较小，这样就会使得数据在table数组中分布较均匀，查询速度也较快。

4）HashMap的遍历

private abstract class HashIterator<E> implements Iterator<E> {
        Entry<K,V> next;        // next entry to return
        int expectedModCount;   // For fast-fail
        int index;              // current slot
        Entry<K,V> current;     // current entry

        HashIterator() {
            expectedModCount = modCount;
            if (size > 0) { // advance to first entry
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
        }

        public final boolean hasNext() {
            return next != null;
        }

        final Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();

            if ((next = e.next) == null) {
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
            current = e;
            return e;
        }

        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;
        }
    }

遍历代码不是很难，我们可以看出哈希表的每一个索引的链表都是从上往下遍历，由于HashMap的存储规则，最晚添加的节点都有可能在第一个索引的链表中，这就造成了HashMap的遍历时无序的。

参考文章：

【1】 JDK7与JDK8中HashMap的实现

【2】 jdk7中HashMap知识点整理

【3】 HashMap深度解析(二)

【4】 Java容器（四）：HashMap（Java 7）的实现原理

【5】 hashmap冲突的解决方法以及原理分析

【6】百度百科：HashMap