HashMap源码分析-CSDN博客

本文链接：https://blog.csdn.net/qq_17589253/article/details/102743831

什么是HashMap

可以拆开理解，即Hash+Map，HashMap是一个存储键值对的数据结构的一种实现，通过对键作hash索引对键值对进行存取操作。由每一个键值作hash索引后，在HashMap中的存储位置是不尽相同的（会有两个键值不同，但hash值相同的情况），所以在HashMap中存储的数据是无序的，且对于键的值而言是不重复的

什么是Hash

Hash散列将一个任意长度的值通过hash函数算法，转换成一个固定的值

Java中的hash函数算法用移位去实现，如下是jdk1.7中HashMap的hash函数

/**
 * Retrieve object hash code and applies a supplemental hash function to the
 * result hash, which defends against poor quality hash functions.  This is
 * critical because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
final int hash(Object k) {
    int h = hashSeed;
    if (0 != h && k instanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

什么是Map

存储键值对的数据结构，kv对：k是用来查找键值对中数据的索引或者标识，所以在Map里，key值一定是唯一的，v是与其绑定的要存储的数据

源码分析

问题：

两个key的hash值重复，value是否会覆盖
HashMap什么时候作扩容？

作put()方法时会扩容

重点方法：put()和get()

以jdk1.7源码为例，看看put()、get()和一些需要讲解的变量常亮

先看里面几个比较重要的属性

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

默认的初始化容量，如果我们调用无参的构造参数，那么我们得到的HashMap，其初始容量即为16

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

HashMap容量的最大值（2^30），即HashMap的容量不得超过这个值

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

默认的负载因子为0.75

/**
 * An empty table instance to share when the table is not inflated.
 */
static final Entry<?,?>[] EMPTY_TABLE = {};

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

一个键值对类型的空表常亮和一个指向这个常亮的键值对类型的哈希索引表（或者数组，下标为哈希索引）

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

/**
 * The next size value at which to resize (capacity * load factor).
 * @serial
 */
// If table == EMPTY_TABLE then this is the initial capacity at which the
// table will be created when inflated.
int threshold;

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
transient int modCount;

表示存储键值对的总量的变量，表示HashMap扩容界限的变量以及表示复杂因子的变量和表示HashMap结构变化次数的常亮

再看一下HashMap的构造函数

/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

HashMap中有4个构造函数，1个无参的，3个有参的，不管调用的是哪个构造函数，最后都会调用到如上的构造函数

从该构造函数可以了解到，传入了一个初始化容量和一个loadFactor，即负载系数，然后再进行参数验证后将参数赋给了成员变量loadFactor和threshold

只是赋值成员变量吗，会不会还有其它的过程没有找到，先别急，我们再来看下put方法

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

这下我们看到了，当调用put时，传入键值对参数，此时先检查table是否是一个空表，并对空table进行扩充和初始化，所以初始化流程在inflateTable()中

/**
 * Inflates the table.
 */
private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

在上面的方法中，判断了给定的容量大小是否会超出容量最大值，作限制处理后将其赋给变量threshold，空table初始化成一个容量为capacity大小的键值对类型的数组，然后执行了方法initHashSeedAsNeeded()

/**
 * Initialize the hashing mask value. We defer initialization until we
 * really need it.
 */
final boolean initHashSeedAsNeeded(int capacity) {
    boolean currentAltHashing = hashSeed != 0;
    boolean useAltHashing = sun.misc.VM.isBooted() &&
            (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
    boolean switching = currentAltHashing ^ useAltHashing;
    if (switching) {
        hashSeed = useAltHashing
            ? sun.misc.Hashing.randomHashSeed(this)
            : 0;
    }
    return switching;
}

上面的方法主要用于初始化hash码值，并赋值到hashSeed中，依据出入的容量，并从jdk中获取jdk.map.althashing.threshold对应的值来最终判断是否需要初始化hashSeed，这个方法在HashMap扩容时会用到，主要是在扩容时调用此方法，判断扩容后是否需要重做hash索引，后面也会提到

再回到put()方法，我们来看下键值对在HashMap中是怎么存储的

public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

首先判断key是否为null，若key是null，则存放key为null的value的值，且认为null的hash索引为0

private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0, null, value, 0);
    return null;
}

若key不为null，则会调用hash()方法和indexFor()方法获取该key对应的hash索引，然后遍历该hash索引对应的键值对链表，若存在一个重复键值对（key重复），替换该键值对的value值，否则会在该链表的头部新增一个键值对，如下方法所示

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

到这里我们大致明白了，HashMap中存储一个键值对时，对于key为null的情况，认为hash索引为0，对key不为null的情况，会算出该key对应hash索引，然后新建链表或者在已有链表上检查是否存在key键重复的情况，将链表进行更新所以对于HashMap而言，数据是key值不重复的，且存储顺序非存入顺序，即无序且key不重复

那么键值对又是怎样获取的呢

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null) ? 0 : hash(key);
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

可以看到HashMap是根据传入的key的值去查找对应的键值对中的value值的，若key为null，则去hash索引为0的链表中找key为null的键值对，取其value值；若key非null，则还是去算key对应的hash索引，找出对应的链表，遍历链表中是否存在key相同的键值对，取其null值，若出现找不到的情况，则会返回null

到这里，分析完put()方法和get()方法，上文提到的第一个问题就有答案，针对两个key的hash值相同的情况，若两个key重复，则会使原有键值对的value的值替换，若两个key不重复，则会在对应的链表头部新增一个键值对，这是不会有键值对的value替换发生

那么HashMap是怎么扩容的？何时扩容的？扩容了多少呢？

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        hash = (null != key) ? hash(key) : 0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}



void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}



void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null != e) {
            Entry<K,V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

HashMap在调用put()方法，即添加键值对实例时会有扩容操作，条件是数据长度到达或超过临界值，且hash索引对应的链表为空。扩容是将容量扩充至2倍，然后将所有键值对重建hash索引并存储，所以，HashMap的一次扩容操作，会遍历所有存储的键值对，并重建hash索引和对应链表，这个过程是相对耗时的，会降低HashMap的使用效率

优缺点分析

优点：

利用数组+链表的方式存储键值对，利用了数组查找快和链表插入删除操作快的优点
通过hash函数和hash索引方式来查找及存储键值对，在不发生hash碰撞和不扩容情况下，put和get的时间复杂度为O(1)，发生hash碰撞时，put和get的时间复杂度为O(k)，k为对应hash索引的链表的遍历次数

缺点：

hash碰撞会影响HashMap的存取效率，原因在于hash碰撞时，会对相应链表进行遍历，因此选用hash算法的好坏对于HashMap来说至关重要
扩容会影响HashMap的使用效率，每当HashMap扩容的时候，对所有Entry对象重新hash，然后放入新数组里，所以频繁扩容会影响效率

怎样去用HashMap

基于上述HashMap的优缺点，使用HashMap时应尽量避免HashMap中的hash碰撞情况和扩容操作，如果我们用基本数据类型的包装类以及String作key值，jdk有现成的hash算法，如果我们用对象类型作key，一定要注意实现好对象类型的hashCode()函数；另外如果能预估要存储的键值对的大致数量，可以的话，new一个HashMap时，尽量传入一个自定义的capacity来避免HashMap的扩容操作，提高其使用效率