HashMap源码分析

  • 一、HashMap的get()方法在java中的工作原理

    哈希相关的数据结构本质上都是键值对(key value pair),HashMap的工作原理是利用哈希(散列),用put()方法和get()方法来存储和检索HashMap对象。

    Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time

    几个关键信息:基于Map接口实现、允许null键/值、非同步、不保证有序、也不保证顺序不随时间变化

  • 二、HashMap键值hashcode冲突

    • put()方法

      • 如果HashMap存储空间为空(new HashMap时不立即分配存储),则用threshold大小分配存储
      • 如果没有发生碰撞,则直接放入bucket中
      • 如果发生碰撞,则以链表的形式存在buckets后
      • 如果buckets长度过长(大于TREEIFY_THRESHOLD),则将链表转换为红黑树
      • 如果节点已存在,则替换并返回oldValue
      • 如果bucket满了(大于 loadfactor * capacity), 则需要再哈希扩容
          public V put(K key, V value) {
              return putVal(hash(key), key, value, false, true);
          }
      
          final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                         boolean evict) {
              Node<K,V>[] tab; Node<K,V> p; int n, i;
              if ((tab = table) == null || (n = tab.length) == 0)
                  n = (tab = resize()).length;
              if ((p = tab[i = (n - 1) & hash]) == null)
                  tab[i] = newNode(hash, key, value, null);
              else {
                  Node<K,V> e; K k;
                  if (p.hash == hash &&
                      ((k = p.key) == key || (key != null && key.equals(k))))
                      e = p;
                  else if (p instanceof TreeNode)
                      e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
                  else {
                      for (int binCount = 0; ; ++binCount) {
                          if ((e = p.next) == null) {
                              p.next = newNode(hash, key, value, null);
                              if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                                  treeifyBin(tab, hash);
                              break;
                          }
                          if (e.hash == hash &&
                              ((k = e.key) == key || (key != null && key.equals(k))))
                              break;
                          p = e;
                      }
                  }
                  if (e != null) { // existing mapping for key
                      V oldValue = e.value;
                      if (!onlyIfAbsent || oldValue == null)
                          e.value = value;
                      afterNodeAccess(e);
                      return oldValue;
                  }
              }
              ++modCount;
              if (++size > threshold)
                  resize();
              afterNodeInsertion(evict);
              return null;
          }
    • 计算下标

      • key为空时,hash值为0
      • key不为空时,key的hashCode与该hashCode高16位按位异或的结果。解决table较小时,高位没有参与下标运算的问题
      • 下标是hash值与table长度(2^x - 1)按位与的结果,而没有使用“%”。原因是“%”计算的开销比“&”大
      下标计算:
          i = (n - 1) & hash
      
          static final int hash(Object key) {
              int h;
              return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
          }
    • get()方法

      • 节点判断(hash相等 & key相等)
      • 如果是bucket的第一个节点,则直接命中返回
      • 如果有冲突,则继续在链表或树中查找
        • 若为树,则通过key.equals(k)查找,时间复杂度为:O(logn)
        • 若为链表,则通过key.eqauls(k)查找,时间复杂度为:O(n)
          public V get(Object key) {
              Node<K,V> e;
              return (e = getNode(hash(key), key)) == null ? null : e.value;
          }
      
          final Node<K,V> getNode(int hash, Object key) {
              Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
              if ((tab = table) != null && (n = tab.length) > 0 &&
                  (first = tab[(n - 1) & hash]) != null) {
                  if (first.hash == hash && // always check first node
                      ((k = first.key) == key || (key != null && key.equals(k))))
                      return first;
                  if ((e = first.next) != null) {
                      if (first instanceof TreeNode)
                          return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                      do {
                          if (e.hash == hash &&
                              ((k = e.key) == key || (key != null && key.equals(k))))
                              return e;
                      } while ((e = e.next) != null);
                  }
              }
              return null;
          }
  • 三、HashMap扩容

    An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

    HashMap中有两个重要的参数,一个是负载因子(loadfactor),另一个是容量(capacity), 当bucket占用程度超过了负载因子希望的比例(0.75),则需要扩容,即将现有容量扩充为原来容量的2倍。如果不扩容,则哈希冲突的概率会大大增加,HashMap的性能会下降;另一方面,容量也不是越大越好,容量越大,冲突的概率越小,但数组遍历的代价也会增大,且会造成资源浪费。

    • resize()方法

      • Node节点中的hash不等于hashCode()方法返回值
      • 初始容量是放置在threshold中
      • 如果oldCap大于等于MAXIMUM_CAPACITY(1 << 30), 则不再重新分配存储,否则capacity扩充为原来两倍并分配存储
      • resize过程中,计算node新下标位置(newCap - 1) & hash
        • 如果(hash & oldCap) == 0, 则bucket下标不变
        • 如果(hash & oldCap) == 1, 则bucket下标变为:原位置 + oldCap
      final Node<K,V>[] resize() {
          Node<K,V>[] oldTab = table;
          int oldCap = (oldTab == null) ? 0 : oldTab.length;
          int oldThr = threshold;
          int newCap, newThr = 0;
          if (oldCap > 0) {
              if (oldCap >= MAXIMUM_CAPACITY) {
                  threshold = Integer.MAX_VALUE;
                  return oldTab;
              }
              else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                       oldCap >= DEFAULT_INITIAL_CAPACITY)
                  newThr = oldThr << 1; // double threshold
          }
          else if (oldThr > 0) // initial capacity was placed in threshold
              newCap = oldThr;
          else {               // zero initial threshold signifies using defaults
              newCap = DEFAULT_INITIAL_CAPACITY;
              newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
          }
          if (newThr == 0) {
              float ft = (float)newCap * loadFactor;
              newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                        (int)ft : Integer.MAX_VALUE);
          }
          threshold = newThr;
          @SuppressWarnings({"rawtypes","unchecked"})
              Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
          table = newTab;
          if (oldTab != null) {
              for (int j = 0; j < oldCap; ++j) {
                  Node<K,V> e;
                  if ((e = oldTab[j]) != null) {
                      oldTab[j] = null;
                      if (e.next == null)
                          newTab[e.hash & (newCap - 1)] = e;
                      else if (e instanceof TreeNode)
                          ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                      else { // preserve order
                          Node<K,V> loHead = null, loTail = null;
                          Node<K,V> hiHead = null, hiTail = null;
                          Node<K,V> next;
                          do {
                              next = e.next;
                              if ((e.hash & oldCap) == 0) {
                                  if (loTail == null)
                                      loHead = e;
                                  else
                                      loTail.next = e;
                                  loTail = e;
                              }
                              else {
                                  if (hiTail == null)
                                      hiHead = e;
                                  else
                                      hiTail.next = e;
                                  hiTail = e;
                              }
                          } while ((e = next) != null);
                          if (loTail != null) {
                              loTail.next = null;
                              newTab[j] = loHead;
                          }
                          if (hiTail != null) {
                              hiTail.next = null;
                              newTab[j + oldCap] = hiHead;
                          }
                      }
                  }
              }
          }
          return newTab;
      }
  • 四、HashMap的线程不安全性

    • 哈希碰撞造成的线程不安全
      假设有两个线程存储map元素时同时产生碰撞,并且hash值相同,获得了同一个节点,最终结果将只有一个新节点添加成功

    • 扩容造成的线程不安全
      假设两个线程存储map元素时同时发现需要扩容,两个线程都生成了自己的新table,最终将只有一个新table生效

  • 五、HashMap的key取值
    HashMap的key取值尽量用字符串、整形或其他包装数据类型,因为这些类型的对象是不可变的(final), 对象一旦生成,则该对象在生命周期内的状态不会变化。HashMap会根据hash值和eqauls方法找到对应的存储对象,如果HashMap存储的key状态变化,则在调用get()方法时会获得错误的值。
    正确的选用key

    • 使用String和Long、Integer等包装类型
    • 使用自定义的不可变(final)类型
    • 使用可变类型,应该保证hashCode方法和equals方法的幂等性
  • 六、HashMap与HashTable的区别

    • HashMap允许null键/值,但HashTable不允许
    • HashMap非线程安全,HashTable线程安全(synchronized)
    • 由于HashTable是线程安全的,所以单线程环境下,HashMap速度比HashTable快
  • 参考

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值