详细解读HashMap源码

最新推荐文章于 2024-04-08 05:32:31 发布

litteTc

最新推荐文章于 2024-04-08 05:32:31 发布

阅读量260

点赞数 1

分类专栏：源码分析文章标签： HashMap 源码解读

本文链接：https://blog.csdn.net/qq_15769369/article/details/79559546

版权

源码分析专栏收录该内容

1 篇文章 0 订阅

订阅专栏

  时间一晃到了3月份，来公司面试的人渐渐多了起来，午后散步的时光总是能听到我的小师父愁眉苦脸的说道，“现在面试问个HashMap都这么难答上来了吗，我还以为是个挺基础的问题，一问深点，说下实现，基本上都说不知道。” 我想对于工作两三年的人来说，可能确实由于没有准备好，平常也遇不到这一类的问题，所以随着时间流逝，就忘了吧，也提醒了我应该在闲暇的时间里多去温习下或者再回顾下java源码，毕竟，莎士比亚说过： 

 
 There are a thousand Hamlets in a thousand people's eyes 

 
 随着技术的提高，再回过头来看看一些底层实现或许会有新的理解与认知把。 

 
 （jdk7） 

 
 1.什么是HashMap？ 

 
 首先得知道什么是Hash 

 
 ① 哈希查找是一种数据结构中用于 查找 的算法，相比于其他查找算法，他的时间复杂度更 

 
 低，所以在实际应用中大量采取了哈希表的方式，Hashmap就是java内置的哈希查找的方法 

 
 ② 哈希函数的基本思想： 将记录的存储地址和关键字之间建立一个确定的对应关系。这样，当想查找某条记录时，我们根据记录的关键字就可以得到它的存储地址，进而快速判断这条记录是否存在，存储在哪里。 

 
 ③负载因子：负载因子是哈希表在其容量自动增加之前可以达到多满的一种尺度，它衡量的是一个散列表的空间的使用程度，负载因子越大表示散列表的装填程度越高，反之愈小。如果负载因子越大，对空间的利用更充分，然而后果是查找效率的降低；如果负载因子太小，那么散列表的数据将过于稀疏，对空间造成严重浪费。hashmap默认负载因子为0.75，一般情况下我们是无需修改的。 

 
 ④ 哈希函数的缺陷+改进方式： 在哈希存储中，不同的关键字可能映射到了相同的地址，这就叫产生冲突，我们必须相处冲突处理的方法。当然，前辈们已经相处了各种各样的方法，我在这里先不做深究。 

 
 ⑤ 经过上述讨论，我们发现， 
 哈希查找的时间复杂度最小（没有冲突）是O(1) 

 
 知识点：解决hash冲突 

 
 https://www.cnblogs.com/novalist/p/6396410.html 

 
 其次要知道什么是Map 

 
 首先Map是java中的一个接口。它是java中的一种重要的数据结构。 

 
 Map是从键(关键字)到值(记录)的映射,键不允许重复,每个键最多能映射一个值。 

 
 在java中，有很多类实现了Map接口，HashMap就是其中的一个 

 
 再回过头来说说什么是Hashmap 

 
 HashMap是一个实现了Map接口的基于哈希表的类 。 

 
 也就是说，HashMap既有map的键值对特点，也有哈希表的特点 

 
 简单点说，利用HashMap类： 

 
 查找时，给出一个关键字key，我们可以根据hash算法计算出key-value的存储位置然后取出value 

 
 存储时，我们根据哈希算法计算出该键值对应该存储的位置，将其存进去。 

 
 也就是说，当没有冲突时，HashMap存取的时间复杂度为O(1) 

 
 2.HashMap的继承关系 

  打开idea->shift+alt+ctrl+u查看hashmap的继承关系，包括查看源码可以看到，hashmap继承了AbstractMap抽象类同时又实现了Map接口，（实际上是不需要再实现Map接口的，网上说法很多，也许是当初作者写错了~）继承AbstraccMap实际上是为了减少直接实现map里的所有方法的工作量，并且实现了cloneable以及序列化接口。并且实现了clone() 

 
 接下来我们看下HashMap中clone()方法的实现， 

 
 1.调用AbstractMap.clone()方法，在AbstractMap.clone()中又调用Object.clone()方法，实现了对象的浅复制。 

 
 2.判断结果中的Entry数组是否空，不为空则调用infalteTable(int toSize)方法初始化新的Map中的容量、Entry数组以及hashSeed。 

 
 3.如果是继承自HashMap的子类如LinkedHashMap会调用子类的init方法。 

 
 4.将当前map中的entry内容全部put到复制好的map中去。 

 
 知识点：java中对象的浅复制与深复制 

 
 http://blog.csdn.net/pony_maggie/article/details/52091588 
   

 
 public  
 Object clone() { 

 
  HashMap< 
 K 
 , 
 V 
 > result =  
 null 
 ; 

 
 try  
 {

 
  result = (HashMap< 
 K 
 , 
 V 
 >) 
 super 
 .clone(); 

 
  }  
 catch  
 (CloneNotSupportedException e) { 

 
   
 // assert false; 

}

 
   
 if  
 (result. 
 table  
 !=  
 EMPTY_TABLE 
 ) { 

 
  result.inflateTable(Math. 
 min 
 ( 

 
  ( 
 int 
 ) Math. 
 min 
 ( 

 
   
 size  
 * Math. 
 min 
 ( 
 1  
 /  
 loadFactor 
 ,  
 4.0f 
 ), 

 
   
 // we have limits... 

 
   
 HashMap. 
 MAXIMUM_CAPACITY 
 ), 

 
   
 table 
 . 
 length 
 )); 

}

 
  result. 
 entrySet  
 =  
 null 
 ; 

 
  result. 
 modCount  
 =  
 0 
 ; 

 
  result. 
 size  
 =  
 0 
 ; 

 
  result.init(); 

 
  result.putAllForCreate( 
 this 
 ); 

 
 return  
 result;

}

/**

 
  * Inflates the table. 

*/

 
 private void  
 inflateTable( 
 int  
 toSize) { 

 
   
 // Find a power of 2 >= toSize 

 
   
 int  
 capacity =  
 roundUpToPowerOf2 
 (toSize); 

 
   
 threshold  
 = ( 
 int 
 ) Math. 
 min 
 (capacity *  
 loadFactor 
 ,  
 MAXIMUM_CAPACITY  
 +  
 1 
 ); 

 
   
 table  
 =  
 new  
 Entry[capacity]; 

 
  initHashSeedAsNeeded(capacity); 

}

  阅读过程中发现， 
 HashMap实现了Serializable接口，但是又发现了transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;，在entry数组前加上了transient关键字，那么序列化就无法保存entry数组里面的数据了，读到后面发现，原来HashMap自己实现了 
 writeObject()以及readObject()方法。其中有对Entry数组做处理，那么为什么HashMap要这么麻烦，需要自己实现对Entry数组单独调用writeObject()方法呢？ 

 
 大家都知道HashMap存储是根据Key的hash值来计算出，键值对应该放在数组的哪个位置，但是在不同的JVM中，得到的hash值不一定相同，意思就是在windows下的虚拟机将key=‘1’计算出来的hash值可能是存在table的第0个位置的，但是在Linux环境下的虚拟机计算出来的key=‘1’的hash值可能是放在table的第1个位置，当我们去读table中的值的时候未必能拿到key=’1’的值。 

 
 那么hashcode是怎么实现的呢？为什么在不同的jvm（java 进程）不一样呢？ 

 
 看看JAVA Object hashcode的源码 

 
 发现是一个native方法 

 
 注解： 

 
 This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language. 

 
 翻译过来大概是：hashcode的值 
 是对象在内存的地址算出来的，不同的程序运行同一个对象，因为内存地址不一样，生成的hashcode当然不一样。 

 
 HashMap如何做的处理 

 
 在反序列化的时候，readObject中调用了一个叫做putForCreate的方法，这个方法中又调用了indexFor这个方法重新计算了key的hash值，这样就可以把key和value可以正确放到数组中。 

 
 知识点：java序列化 

 
 http://developer.51cto.com/art/201202/317181.htm 
   

 
 writeObject()以及readObject() 
 方法实现： 

/**

 
  * Save the state of the  
 <tt> 
 HashMap 
 </tt> 
  instance to a stream (i.e., 

 
  * serialize it). 

*

 
  *  
 @serialData  
 The  
 <i> 
 capacity 
 </i> 
  of the HashMap (the length of the 

 
  * bucket array) is emitted (int), followed by the 

 
  *  
 <i> 
 size 
 </i> 
  (an int, the number of key-value 

 
  * mappings), followed by the key (Object) and value (Object) 

 
  * for each key-value mapping. The key-value mappings are 

 
  * emitted in no particular order. 

*/

 
 private void  
 writeObject(java.io.ObjectOutputStream s) 

 
 throws  
 IOException

{

 
   
 // Write out the threshold, loadfactor, and any hidden stuff 

 
   
 s.defaultWriteObject(); 

 
   
 // Write out number of buckets 

 
   
 if  
 ( 
 table 
 == 
 EMPTY_TABLE 
 ) { 

 
  s.writeInt( 
 roundUpToPowerOf2 
 ( 
 threshold 
 )); 

 
  }  
 else  
 { 

 
  s.writeInt( 
 table 
 . 
 length 
 ); 

}

 
   
 // Write out size (number of Mappings) 

 
   
 s.writeInt( 
 size 
 ); 

 
   
 // Write out keys and values (alternating) 

 
   
 if  
 ( 
 size  
 >  
 0 
 ) { 

 
   
 for 
 (Map.Entry< 
 K 
 , 
 V 
 > e : entrySet0()) { 

 
  s.writeObject(e.getKey()); 

 
  s.writeObject(e.getValue()); 

}

}

}

 
 private static final long  
 serialVersionUID  
 =  
 362498820763181265L 
 ; 

/**

 
  * Reconstitute the { 
 @code  
 HashMap} instance from a stream (i.e., 

 
  * deserialize it). 

*/

 
 private void  
 readObject(java.io.ObjectInputStream s) 

 
 throws  
 IOException, ClassNotFoundException

{

 
   
 // Read in the threshold (ignored), loadfactor, and any hidden stuff 

 
   
 s.defaultReadObject(); 

 
   
 if  
 ( 
 loadFactor  
 <=  
 0  
 || Float. 
 isNaN 
 ( 
 loadFactor 
 )) { 

 
   
 throw new  
 InvalidObjectException( 
 "Illegal load factor: "  
 + 

 
 loadFactor 
 );

}

 
   
 // set other fields that need values 

 
   
 table  
 = (Entry< 
 K 
 , 
 V 
 >[])  
 EMPTY_TABLE 
 ; 

 
   
 // Read in number of buckets 

 
 s.readInt();  
 // ignored.

 
  // Read number of mappings 

 
 int  
 mappings = s.readInt();

 
   
 if  
 (mappings <  
 0 
 ) 

 
   
 throw new  
 InvalidObjectException( 
 "Illegal mappings count: "  
 + 

 
  mappings); 

 
   
 // capacity chosen by number of mappings and desired load (if >= 0.25) 

 
   
 int  
 capacity = ( 
 int 
 ) Math. 
 min 
 ( 

 
  mappings * Math. 
 min 
 ( 
 1  
 /  
 loadFactor 
 ,  
 4.0f 
 ), 

 
   
 // we have limits... 

 
   
 HashMap. 
 MAXIMUM_CAPACITY 
 ); 

 
   
 // allocate the bucket array; 

 
   
 if  
 (mappings >  
 0 
 ) { 

 
  inflateTable(capacity); 

 
  }  
 else  
 { 

 
 threshold  
 = capacity;

}

 
  init();  
 // Give subclass a chance to do its thing. 

 
  // Read the keys and values, and put the mappings in the HashMap 

 
   
 for  
 ( 
 int  
 i =  
 0 
 ; i < mappings; i++) { 

 
   
 K  
 key = ( 
 K 
 ) s.readObject(); 

 
   
 V  
 value = ( 
 V 
 ) s.readObject(); 

 
  putForCreate(key, value); 

}

}

 
 2.HashMap的属性 

 
 默认初始容量，必须为2的次方。 16 

 
 static final int  
 DEFAULT_INITIAL_CAPACITY  
 =  
 1  
 <<  
 4 
 ;  
 // aka 16 

 
 最大容量，在具体参数的构造函数中指定了更高的初始值，则使用最大容量。 2^30 = 1073741824 

 
 static final int  
 MAXIMUM_CAPACITY  
 =  
 1  
 <<  
 30 
 ; 

 
 默认负载因子，在构造函数中没有指定的加载因子时则使用默认负载因子。 0.75 

 
 static final float  
 DEFAULT_LOAD_FACTOR  
 =  
 0.75f 
 ; 

 
 默认空表，当table没有进行inflated时共享的空表实例。 

 
 static final  
 Entry<?,?>[]  
 EMPTY_TABLE  
 = {}; 

 
 hashmap中的核心存储结构，存储的数据都存放在这个table中，必要的时候会调整大小，长度必须是2的次方。 

 
 transient  
 Entry< 
 K 
 , 
 V 
 >[]  
 table  
 = (Entry< 
 K 
 , 
 V 
 >[])  
 EMPTY_TABLE 
 ; 

 
 表示HashMap中存放KV的数量（为链表/树中的KV的总和） 

 
 transient int  
 size 
 ; 

 
 threshold 扩容变量，表示当HashMap的size 
 (capacity * load factor) 
 大于threshold时会执行resize操作 

 
 int  
 threshold 
 ; 

 
 负载因子 负载因子用来衡量HashMap满的程度。计算HashMap的实时装载因子的方法为：size/capacity。 

 
 final float  
 loadFactor 
 ; 

 
 修改次数，这个HashMap的结构修改的次数是那些改变HashMap中的映射数量或修改其内部结构(例如rehash)的那些。这个字段用于使迭代器(iterator)对HashMap失败快速的集合视图。 

 
 transient int  
 modCount 
 ; 

 
 threshold的最大值。 

 
 static final int  
 ALTERNATIVE_HASHING_THRESHOLD_DEFAULT  
 = Integer. 
 MAX_VALUE 
 ; 

 
 计算hash值的时候使用，初始值为0。 

 
 transient int  
 hashSeed  
 =  
 0 
 ; 

 
 3.HashMap的数据结构图（Entry数组+链表）： 

 
 Entry是HashMap中的一个静态内部类。 

 
 static class  
 Entry< 
 K 
 , 
 V 
 >  
 implements  
 Map.Entry< 
 K 
 , 
 V 
 > { 

 
   
 final  
 K  
 key 
 ; 

 
   
 V  
 value 
 ; 

 
  Entry< 
 K 
 , 
 V 
 >  
 next 
 ; 

 
   
 int  
 hash 
 ; 

/**

 
  * Creates new entry. 

*/

 
   
 Entry( 
 int  
 h,  
 K  
 k,  
 V  
 v, Entry< 
 K 
 , 
 V 
 > n) { 

 
 value  
 = v;

 
 next  
 = n;

 
 key  
 = k;

 
 hash  
 = h;

}

 
   
 public final  
 K  
 getKey() { 

 
   
 return  
 key 
 ; 

}

 
   
 public final  
 V  
 getValue() { 

 
   
 return  
 value 
 ; 

}

 
   
 public final  
 V  
 setValue( 
 V  
 newValue) { 

 
   
 V  
 oldValue =  
 value 
 ; 

 
 value  
 = newValue;

 
 return  
 oldValue;

}

 
 public final boolean  
 equals(Object o) {

 
   
 if  
 (!(o  
 instanceof  
 Map.Entry)) 

 
 return false 
 ;

 
  Map.Entry e = (Map.Entry)o; 

 
  Object k1 = getKey(); 

 
  Object k2 = e.getKey(); 

 
   
 if  
 (k1 == k2 || (k1 !=  
 null  
 && k1.equals(k2))) { 

 
  Object v1 = getValue(); 

 
  Object v2 = e.getValue(); 

 
   
 if  
 (v1 == v2 || (v1 !=  
 null  
 && v1.equals(v2))) 

 
 return true 
 ;

}

 
 return false 
 ;

}

 
 简单来说，HashMap由数组+链表组成的，数组是HashMap的主体，链表则是主要为了解决哈希冲突而存在的，（ 
 拉链法 
 ）如果定位到的数组位置不含链表（当前entry的next指向null）,那么对于查找，添加等操作很快，仅需一次寻址即可；如果定位到的数组包含链表，对于添加操作，其时间复杂度依然为O(1)，因为最新的Entry会 
 插入链表头部 
 ，急需要简单改变引用链即可，而对于查找操作来讲，此时就需要遍历链表，然后通过 
 key对象的equals方法 
 逐一比对查找。所以，性能考虑， 
 HashMap中的链表出现越少，性能才会越好。 

 
 知识点：java对象中的equals()与hashcode()联系与区别。 

 
 http://www.importnew.com/25783.html 

 
 4.HashMap的构造方法 

  HashMap中有四种构造方法，核心的构造方法带两个参数： 

 
 public  
 HashMap( 
 int  
 initialCapacity,  
 float  
 loadFactor) { 

 
   
 if  
 (initialCapacity <  
 0 
 ) 

 
   
 throw new  
 IllegalArgumentException( 
 "Illegal initial capacity: "  
 + 

 
  initialCapacity); 

 
   
 if  
 (initialCapacity >  
 MAXIMUM_CAPACITY 
 ) 

 
  initialCapacity =  
 MAXIMUM_CAPACITY 
 ; 

 
   
 if  
 (loadFactor <=  
 0  
 || Float. 
 isNaN 
 (loadFactor)) 

 
   
 throw new  
 IllegalArgumentException( 
 "Illegal load factor: "  
 + 

 
  loadFactor); 

 
   
 this 
 . 
 loadFactor  
 = loadFactor; 

 
 threshold  
 = initialCapacity;

 
  init(); 

}

  1.判断初始容量是否小于0，抛异常。 

  2.判断初始容量是否大于最大容量，大于则初始为最大容量2^30。 

  3.如果负载因子小于0，或者负载因子不为数字，抛异常。 

  4.赋值。 

  5.调用init()方法，针对于不同的HashMap的子类有不同的实现。 

 
 5.HashMap核心方法介绍： 

litteTc

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录