1.7 HashMap源码分析

最新推荐文章于 2023-12-19 01:17:16 发布

攻城老湿

最新推荐文章于 2023-12-19 01:17:16 发布

阅读量629

点赞数

文章标签：链表 java hashmap 数据结构 python

本文链接：https://blog.csdn.net/yorickjun/article/details/122920354

版权

1.存储结构

HashMap的内部存储结构其实是数组和链表的结合。当实例化一个HashMap时，系统会创建一个长度为Capacity的Entry数组，这个长度被称为容量(Capacity)，在这个数组中可以存放元素的位置我们称之为“桶”(bucket)，每个bucket都有自己的索引，系统可以根据索引快速的查找bucket中的元素。每个bucket中存储一个元素，即一个Entry对象，但每一个Entry对象可以带一个引用变量，用于指向下一个元素，因此，在一个桶中，就有可能生成一个Entry链。 Entry是HashMap的基本组成单元，每一个Entry包含一个key-value键值对。 Entry是HashMap中的一个静态内部类。

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next; //存储指向下一个Entry的引用，单链表结构
        int hash; //对key的hashcode值进行hash运算后得到的值，存储在Entry，避免重复计算

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
            //...
}

2 构造方法

默认构造方法

调用HashMap的默认构造方法，传人默认参数，调用有参的构造方法

// DEFAULT_INITIAL_CAPACITY = 1 << 4 默认的初始容量16
// DEFAULT_LOAD_FACTOR = 0.75f 默认的加载因子0.75
public HashMap() {
  this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}

有参构造方法

通过默认构造方法会调用有参构造方法，该方法会对HashMap的加载因子和阈值进行初始化。在常规构造器中，并没有马上为数组table分配内存空间，而是在执行第一次put操作的时候才真正构建table数组。

public HashMap(int initialCapacity, float loadFactor) {
  // 判断初始容量是否不合法
  if (initialCapacity < 0)
    throw new IllegalArgumentException("Illegal initial capacity: " +
                                       initialCapacity);
  if (initialCapacity > MAXIMUM_CAPACITY)
    initialCapacity = MAXIMUM_CAPACITY;
  // 判断加载因子是否不合法
  if (loadFactor <= 0 || Float.isNaN(loadFactor))
    throw new IllegalArgumentException("Illegal load factor: " +
                                       loadFactor);
    // HashMap参数赋值
  this.loadFactor = loadFactor;
  threshold = initialCapacity;
  // 空方法，在其子类如 LinkedHashMap 中就会有对应实现
  init();
}

3 put方法

用户调用HashMap的put方法可以完成将k-v对插入至容器中。

public class TestMap {
    public static void main(String[] args) {
        HashMap<String,String> map = new HashMap<>();
        map.put("a","1"); // 插入数据
        System.out.println(map.get("a"));
    }
}

put方法

put方法完成对数据的插入，若当前数组为空时，需要对数组进行初始化操作。计算key的hash下标后，在对应位置遍历查询key是否存在，存在则重新赋值。最好若不存在对应数值。则将该k-v构建Entry后插入。

public V put(K key, V value) {
  // 如果table数组为空数组{}，进行数组初始化
  if (table == EMPTY_TABLE) { // EMPTY_TABLE = {} 空数组
    // 分配数组空间
    // 入参为threshold，此时threshold为initialCapacity 默认是1<<4(=16)
    inflateTable(threshold);
  }
  // 如果key为null，存储位置为table[0]的数组和链表上
  if (key == null)
    return putForNullKey(value);
  // 对key的hashcode进一步计算，通过异或运算确保散列均匀
  int hash = hash(key);
  // 获取在table中的实际位置下标
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    // 如果该对应数据已存在，执行覆盖操作。用新value替换旧value，并返回旧value
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      V oldValue = e.value;
      e.value = value;
      e.recordAccess(this); //调用value的回调函数，这个函数也为空实现
      return oldValue;
    }
  }
    // 记录修改次数，保证并发访问时，若HashMap内部结构发生变化，快速响应失败
  modCount++;
  // 新增一个entry
  addEntry(hash, key, value, i);
  return null;
}

inflateTable方法

inflateTable方法用于对数组进行初始化操作。

private void inflateTable(int toSize) {
   // Find a power of 2 >= toSize
   // capacity一定是2的次幂,比如toSize=13,则capacity=16
   int capacity = roundUpToPowerOf2(toSize);
     // 依据加载因子为HashMap的扩容阈值赋值
   threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
   // 分配空间
   table = new Entry[capacity];
   // 选择合适的Hash因子
   initHashSeedAsNeeded(capacity);
 }

hash方法

hash方法计算散列值，用了很多的异或，移位等运算，对key的hashcode进一步进行计算以及二进制位的调整等来保证最终获取的存储位置尽量分布均匀。影响HashMap元素的存储位置的只有key的值，与value值无关。

final int hash(Object k) {
  int h = hashSeed;
  // 这里针对String优化了Hash函数，是否使用新的Hash函数和Hash因子有关
  if (0 != h && k instanceof String) {
    return sun.misc.Hashing.stringHash32((String) k);
  }

  h ^= k.hashCode();

  // This function ensures that hashCodes that differ only by
  // constant multiples at each bit position have a bounded
  // number of collisions (approximately 8 at default load factor).
  h ^= (h >>> 20) ^ (h >>> 12);
  return h ^ (h >>> 7) ^ (h >>> 4);
}

indexFor方法

通过indexFor进一步处理来获取实际的存储位置。h &(length-1)保证获取的index一定在数组范围内。取模运算也可以实现，但是位运算对计算机来说性能更高。

static int indexFor(int h, int length) {
  // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
  return h & (length-1);
}

addEntry方法

当size大于阈值并且数组第一个元素位置不为null的时候，需要进行数组扩容，扩容时，需要新建一个长度为之前数组2倍的新的数组，然后将当前的Entry数组中的元素全部传输过去，扩容后的新数组长度为之前的2倍。

void addEntry(int hash, K key, V value, int bucketIndex) {
  if ((size >= threshold) && (null != table[bucketIndex])) {
    // 扩容，新容量为旧容量的2倍
    resize(2 * table.length);
    hash = (null != key) ? hash(key) : 0;
    // 扩容后，计算当前元素的插入位置下标
    bucketIndex = indexFor(hash, table.length);
  }
    // 把元素放入HashMap的桶的对应位置
  createEntry(hash, key, value, bucketIndex);
}

createEntry方法

通过头插法将新的Entry插入。

void createEntry(int hash, K key, V value, int bucketIndex) {
  // 获取待插入位置元素
  Entry<K,V> e = table[bucketIndex];
  // 新插入的元素指向原有元素,并将新的Entry放在数组的第一个位置
  table[bucketIndex] = new Entry<>(hash, key, value, e);
  // 元素个数+1  
  size++;
}

resize方法

扩容操作通过resize操作实现。

void resize(int newCapacity) {
  // 老的数组
  Entry[] oldTable = table;
  // 获取老的容量值  
  int oldCapacity = oldTable.length;
  if (oldCapacity == MAXIMUM_CAPACITY) {
    threshold = Integer.MAX_VALUE;
    return;
  }
    // 新的数组  
  Entry[] newTable = new Entry[newCapacity];
  // 将老的表中的数据拷贝到新的数组中  
  transfer(newTable, initHashSeedAsNeeded(newCapacity));
  // 修改HashMap的底层数组 
  table = newTable;
  // 修改扩容阀值 
  threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

transfer方法

将老的表中的数据拷贝到新的数组中。

void transfer(Entry[] newTable, boolean rehash) {
  int newCapacity = newTable.length;
  // 遍历所有桶
  for (Entry<K,V> e : table) {
    // 遍历桶中所有元素(链表)
    while(null != e) {
      Entry<K,V> next = e.next;
      if (rehash) {
        e.hash = null == e.key ? 0 : hash(e.key);
      }
      // 定位新的Hash桶位置下标(并没有重新计算hash值，而是与新的容量进行&操作，只会在两个位置中选择)
      int i = indexFor(e.hash, newCapacity);
      // 元素连接到桶中,这里相当于单链表的插入，总是插入在最前面
      e.next = newTable[i];
      // newTable[i]的值总是最新插入的值
      newTable[i] = e;
      // 继续遍历下一个元素
      e = next;
    }
  }
}