你对HashMap有多少了解？

最新推荐文章于 2021-06-18 00:01:14 发布

new_repo

最新推荐文章于 2021-06-18 00:01:14 发布

阅读量228

点赞数

文章标签： hashmap

本文链接：https://blog.csdn.net/Mabanana/article/details/107310718

版权

基于jdk8进行分析，并与jdk7进行比较

HashMap概述

HashMap是基于Map接口实现的，元素以键值对的方式进行存储，并且允许使用null键和null值，因为key不允许重复，所以只能有一个就键为null，另外HashMap不能保证放入元素的顺序，所以他是无序的，和放入的顺序并不能完全相同。而且HashMap是线程不安全的（主要体现在Put方法）

继承关系

public class HashMap<K,V> extends AbstractMap<K,V>
   implements Map<K,V>, Cloneable, Serializable

基本属性


/**
* The default initial capacity - MUST be a power of two.
* 默认初始化大小  16（位运算 1<<4 --> 2^4=16）
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified by either of the constructors with arguments. MUST be a power of two <= 1<<30.
* HashMap最大容量  2^30
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
* 负载因子  当负载等于容量的0.75时，需要进行扩容
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin.  Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
* 
* 链表的最大长度，即需要转换红黑树的边界，当链表长度达到8的时候就需要将链表转换成树
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* 
* 在哈希表扩容时,如果发现链表长度小于 6,则会由树重新退化为链表。
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* 
* 在转变成树之前，还会有一次判断，只有键值对数量大于 64 才会发生转换。这是为了避免在哈希表建立初期，多个键值对恰好被放入了同一个链表中而导致不必要的转化。
*/
static final int MIN_TREEIFY_CAPACITY = 64;

注意：HashMap的扩容操作是非常耗时的，所以尽可能避免多次扩容，并且是线程不安全的，多线程环境中推荐使用ConcurrentHashMap

HashMap与HashTable

1、线程安全性
HashTable是线程安全的，而HashMap是线程不安全的
主要原因是HashTable在实现方法中都添加了synchronized关键字来确保线程不同，因此相对于HashMap而言性能会低一点，所以我们平时使用若无特殊要求一般使用HashMap，在多线程环境下若使用HashMap使用Collections.synchronizedMap()方法来获取一个线程安全的集合
2、针对null不同
HashMap可以使用null作为key，而HashTable不允许使用null作为key，因为一旦不小心使用了，会引发一些问题。
HashMap以null作为key时，总是存储在table数组的第一个节点上
3、继承结构
HashMap是对Map接口的实现，HashTable实现了Map接口和Dictionary抽象类
4、初始容量
HashMap的初始容量为16，HashTable初始容量为11，两者的填充因子默认都是0.75
HashMap扩容时是当前容量翻倍：capacity*2，HashTable扩容时是容量翻倍+1，即capacity*2+1
5、计算hash的方法不同
HashTable计算hash是直接使用key的hashcode对table数组的长度直接取模

int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;

HashMap是把key的hashcode取出来，然后把它右移16位，然后取异或

static final int hash(Object key) {
    int h;
    //也就将key的hashCode无符号右移16位然后与hashCode异或从而得到hash值在putVal方法中（n - 1）& hash计算得到桶的索引位置
    //注意，这里h是int值，也就是32位，然后无符号又移16位，那么就是折半，折半之后和原来的数据做异或操作，正好整合了高位和低位的数据
    //混合原始哈希码的高位和低位，以此来加大低位的随机性,而且混合后的低位掺杂了高位的部分特征，这样高位的信息也被变相保留下来。
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

HashMap的数据存储结构

1、HashMap由数组、链表和红黑树来实现对数据的存储
HashMap采用Entry数组存储key-value对，每一个键值对组成一个Entry实体，Entry实体实际上是一个单项的链表结构，具有Next指针，可以连接到下一个Entry实体，以此解决Hash冲突问题
在这里插入图片描述
当添加一个元素（key-value）时，就首先计算元素key的hash值，以此确定插入数组的位置，但是可能存在同一hash值得元素已经被放在数组的同一位置了，这时就添加到同一hash值得元素的后面，他们在数组的同一位置，但是形成了链表，同一位置各链表上的hash值是相同的，所以说数组存放的是链表。而当链表太长的时候，链表的查询效率就会变慢，所以就转换成红黑树，降低高度，提高查找的效率。

当链表数组的容量超过初始容量的0.75时，再散列将链表数组扩大2倍，把原链表数组搬移到新的数组中。

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
              boolean evict) {
   Node<K,V>[] tab; Node<K,V> p; int n, i;
   //如果table在(n-1)&hash的值是空，就新建一个节点插入到该位置
   if ((tab = table) == null || (n = tab.length) == 0)
       n = (tab = resize()).length;
   if ((p = tab[i = (n - 1) & hash]) == null)
       tab[i] = newNode(hash, key, value, null);
   //表示有冲突，开始处理冲突
   else {
       Node<K,V> e; K k;
       //检查第一个Node，p是不是就是找的值
       if (p.hash == hash &&
           ((k = p.key) == key || (key != null && key.equals(k))))
           e = p;
       else if (p instanceof TreeNode)
           e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
       else {
           for (int binCount = 0; ; ++binCount) {
          	//指针为空，就挂在后面
               if ((e = p.next) == null) {
                   p.next = newNode(hash, key, value, null);
                   //如果冲突的节点数已经达到了8个，看是否需要改变冲突节点的存储结构
                   //treeifybin首先判断当前hashmap的长度，如果不满足64则只进行resize，扩容table，如果达到64则转换存储结构为红黑树
                   if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                       treeifyBin(tab, hash);
                   break;
               }
               //如果有相同的key就结束遍历  
               if (e.hash == hash &&
                   ((k = e.key) == key || (key != null && key.equals(k))))
                   break;
               p = e;
           }
       }
       //链表上有相同的key值
       if (e != null) { // existing mapping for key
           V oldValue = e.value;
           if (!onlyIfAbsent || oldValue == null)
               e.value = value;
           afterNodeAccess(e);
           return oldValue;
       }
   }
   ++modCount;
   //如果当前大小大于门限，门限为初始容量的0.75
   if (++size > threshold)
   //扩容2倍
       resize();
   afterNodeInsertion(evict);
   return null;
}

上述过程描述：
1，判断键值对数组tab[]是否为空或为null，否则以默认大小resize()；
2，根据键值key计算hash值得到插入的数组索引i，如果tab[i]==null，直接新建节点添加，否则转入3
3，判断当前数组中处理hash冲突的方式为链表还是红黑树(check第一个节点类型即可),分别处理

重要方法

构造方法

public HashMap(int initialCapacity, float loadFactor) //指定初始容量和负载因子的构造方法
public HashMap(int initialCapacity) //指定初始容量的构造方法
public HashMap() //无参构造方法
public HashMap(Map<? extends K, ? extends V> m) //指定集合，转换成HashMap

前三个构造方法都没有进行数组的初始化操作，即使调用了构造方法，此时存放HashMap中数组元素的table表长度依旧是0

添加方法

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
              boolean evict) {
   Node<K,V>[] tab; Node<K,V> p; int n, i;
   //如果table在(n-1)&hash的值是空，就新建一个节点插入到该位置
   if ((tab = table) == null || (n = tab.length) == 0)
       n = (tab = resize()).length;
   if ((p = tab[i = (n - 1) & hash]) == null)
       tab[i] = newNode(hash, key, value, null);
   //表示有冲突，开始处理冲突
   else {
       Node<K,V> e; K k;
       //检查第一个Node，p是不是就是找的值
       if (p.hash == hash &&
           ((k = p.key) == key || (key != null && key.equals(k))))
           e = p;
       else if (p instanceof TreeNode)
           e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
       else {
           for (int binCount = 0; ; ++binCount) {
          	//指针为空，就挂在后面
               if ((e = p.next) == null) {
                   p.next = newNode(hash, key, value, null);
                   //如果冲突的节点数已经达到了8个，看是否需要改变冲突节点的存储结构
                   //treeifybin首先判断当前hashmap的长度，如果不满足64则只进行resize，扩容table，如果达到64则转换存储结构为红黑树
                   if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                       treeifyBin(tab, hash);
                   break;
               }
               //如果有相同的key就结束遍历  
               if (e.hash == hash &&
                   ((k = e.key) == key || (key != null && key.equals(k))))
                   break;
               p = e;
           }
       }
       //链表上有相同的key值
       if (e != null) { // existing mapping for key
           V oldValue = e.value;
           if (!onlyIfAbsent || oldValue == null)
               e.value = value;
           afterNodeAccess(e);
           return oldValue;
       }
   }
   ++modCount;
   //如果当前大小大于门限，门限为初始容量的0.75
   if (++size > threshold)
   //扩容2倍
       resize();
   afterNodeInsertion(evict);
   return null;
}

获取方法

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
/**
  * Implements Map.get and related methods
  *
  * @param hash hash for key
  * @param key the key
  * @return the node, or null if none
  */
final Node<K,V> getNode(int hash, Object key) {
     Node<K,V>[] tab;//Entry对象数组
Node<K,V> first,e; //在tab数组中经过散列的第一个位置
int n;
K k;
/*找到插入的第一个Node，方法是hash值和n-1相与，tab[(n - 1) & hash]*/
//也就是说在一条链上的hash值相同的
     if ((tab = table) != null && (n = tab.length) > 0 &&(first = tab[(n - 1) & hash]) != null) {
/*检查第一个Node是不是要找的Node*/
         if (first.hash == hash && // always check first node
             ((k = first.key) == key || (key != null && key.equals(k))))//判断条件是hash值要相同，key值要相同
             return first;
/*检查first后面的node*/
         if ((e = first.next) != null) {
             if (first instanceof TreeNode)
                 return ((TreeNode<K,V>)first).getTreeNode(hash, key);
	/*遍历后面的链表，找到key值和hash值都相同的Node*/
             do {
                 if (e.hash == hash &&
                     ((k = e.key) == key || (key != null && key.equals(k))))
                     return e;
             } while ((e = e.next) != null);
         }
     }
     return null;
 }

get(key)方法时获取key的hash值，计算hash&(n-1)得到在链表数组中的位置first=tab[hash&(n-1)]，先判断first的key是否与参数的key相等，不等就遍历后边链表找到相同的key值返回对应的value值

HashMap扩容机制 resize()

构造hash表时，如果不指明初始大小，默认大小为16（Node数组的大小为16），如果Node[]数组中的元素达到填充比后需要重新调整HashMap的大小，变为原来的2倍大小

/**
* Initializes or doubles table size.  If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
   Node<K,V>[] oldTab = table;
   int oldCap = (oldTab == null) ? 0 : oldTab.length;
   int oldThr = threshold;
   int newCap, newThr = 0;

/*如果旧表的长度不是空*/
   if (oldCap > 0) {
       if (oldCap >= MAXIMUM_CAPACITY) {
           threshold = Integer.MAX_VALUE;
           return oldTab;
       }
/*把新表的长度设置为旧表长度的两倍，newCap=2*oldCap*/
       else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                oldCap >= DEFAULT_INITIAL_CAPACITY)
  /*把新表的门限设置为旧表门限的两倍，newThr=oldThr*2*/
           newThr = oldThr << 1; // double threshold
   }
/*如果旧表的长度的是0，就是说第一次初始化表*/
   else if (oldThr > 0) // initial capacity was placed in threshold
       newCap = oldThr;
   else {               // zero initial threshold signifies using defaults
       newCap = DEFAULT_INITIAL_CAPACITY;
       newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
   }



   if (newThr == 0) {
       float ft = (float)newCap * loadFactor;//新表长度乘以加载因子
       newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                 (int)ft : Integer.MAX_VALUE);
   }
   threshold = newThr;
   @SuppressWarnings({"rawtypes","unchecked"})
/*下面开始构造新表，初始化表中的数据*/
   Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
   table = newTab;//把新表赋值给table
   if (oldTab != null) {//原表不是空要把原表中数据移动到新表中	
       /*遍历原来的旧表*/		
       for (int j = 0; j < oldCap; ++j) {
           Node<K,V> e;
           if ((e = oldTab[j]) != null) {
               oldTab[j] = null;
               if (e.next == null)//说明这个node没有链表直接放在新表的e.hash & (newCap - 1)位置
                   newTab[e.hash & (newCap - 1)] = e;
               else if (e instanceof TreeNode)
                   ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
/*如果e后边有链表,到这里表示e后面带着个单链表，需要遍历单链表，将每个结点重*/
               else { // preserve order保证顺序
新计算在新表的位置，并进行搬运
                   Node<K,V> loHead = null, loTail = null;
                   Node<K,V> hiHead = null, hiTail = null;
                   Node<K,V> next;
	
                   do {
                       next = e.next;//记录下一个结点
//新表是旧表的两倍容量，实例上就把单链表拆分为两队，
　　　　　　　　　　　　　　//e.hash&oldCap为偶数一队，e.hash&oldCap为奇数一对
                       if ((e.hash & oldCap) == 0) {
                           if (loTail == null)
                               loHead = e;
                           else
                               loTail.next = e;
                           loTail = e;
                       }
                       else {
                           if (hiTail == null)
                               hiHead = e;
                           else
                               hiTail.next = e;
                           hiTail = e;
                       }
                   } while ((e = next) != null);
	
                   if (loTail != null) {//lo队不为null，放在新表原位置
                       loTail.next = null;
                       newTab[j] = loHead;
                   }
                   if (hiTail != null) {//hi队不为null，放在新表j+oldCap位置
                       hiTail.next = null;
                       newTab[j + oldCap] = hiHead;
                   }
               }
           }
       }
   }
   return newTab;
}

Java8使用红黑树的改进

在Java8中对HashMap的源码进行了优化，在jdk7中，HashMap处理“碰撞”的时候，都是采用链表来存储，当碰撞的节点很多事，查询时间是O(n)。
在Java8中，HashMap处理“碰撞”增加了红黑树这种数据结构，当碰撞节点较少时，采用链表存储，当较大时（>8个），采用红黑树（特点是查询时间O（logn））存储（有一个阀值控制，当大于8的时候讲链表存储转换成红黑树存储
在这里插入图片描述

为什么负载因子为0.75呢？

通过大量实验统计得出来的，如果过小，比如0.5，那么当存放的元素超过一半时就进行扩容，会造成资源的浪费；如果过大，比如1，那么当元素满的时候才进行扩容，会使get,put操作的碰撞几率增加。
同时hashmap不是无限增大容量的，当达到极限的时候就不再进行扩容：MAXIMUM_CAPACITY

为什么HashMap的容量是2的n次幂？

原因有两个：
1、关系到元素在同种的位置计算问题
简单来讲，一个元素放到哪个桶中是通过"hash % capacity"取模运算得到的余数来确定的（注：“元素的key的哈希值”）
hashMap用另外一种方式来替代取模运算–位运算：(capacity - 1)& hash。这种运算方式不但可以得到和取模一样的结果，而且效率还远高于取模运算的效率。
如：
在这里插入图片描述
2、关系到扩容后元素在newCap中的放置问题
详见如下链接：
https://www.cnblogs.com/zhuxiaopijingjing/p/12334349.html
https://www.cnblogs.com/liuwhut/p/13267711.html
https://segmentfault.com/a/1190000017509668
总而言之，容量一定是2的n次幂是为了提高“计算元素放在哪个桶”的效率，也是为了提高扩容效率，避免了扩容后在重复处理哈希碰撞的问题。
一切为了效率

new_repo

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
你对HashMap有多少了解？

基于jdk8进行分析，并与jdk7进行比较HashMap概述HashMap是基于Map接口实现的，元素以键值对的方式进行存储，并且允许使用null键和null值，因为key不允许重复，所以只能有一个就键为null，另外HashMap不能保证放入元素的顺序，所以他是无序的，和放入的顺序并不能完全相同。而且HashMap是线程不安全的（主要体现在Put方法）继承关系public class HashMap<K,V> extends AbstractMap<K,V> impl
复制链接

扫一扫