Java的HashMap应该是面试中问的比较多的数据结构,今天讲解下HashMap在jdk1.7是怎么设计的。分别从以下几个方面讲解:
1:HashMap的数据结构,HashMap的put操作、以及何时扩容。
2:HashMap为什么是非线程安全的?
1:HashMap的数据结构。
1.1:首先,介绍HashMap里面的几个基本变量。
DEFAULT_INITIAL_CAPACITY:默认初始容量1 << 4==16;
MAXIMUM_CAPACITY:最大容量1 << 30;
DEFAULT_LOAD_FACTOR:默认负载因子0.75;
loadFactor:负载因子,不指定,使用默认负载因子;
threshold:阀值,超过该值,HashMap会自动扩容至原来容量的2倍;
modCount:是用来实现fail-fast,多用于非线程安全的类,如ArrayList, LinkedList, HashMap,在这些非线程安全的集合中,初始化迭代器时,会给modCount赋值,如果在遍历的过程中,一旦发现这个对象的modCount和迭代器存储的modCount不一样,就会报错。
table:类型Entry<K,V>[] 数组,每个Entry都是单向链表,所以HashMap结构是:数组+链表,如下图:
1.2:put方法代码:
public V put(K key, V value) {
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {//确定了位置table[i]后,开始遍历它下面的单链表
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {//hash相同,key一样,新数据覆盖老数据,同时返回老数据
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);//不存在,添加
return null;
}
可以看到里面有几个关键的方法,接下来我们一个个分析每个方法的作用。
/**
* Inflates the table.
*说明: 填充table
*/
private void inflateTable(int toSize) {
// Find a power of 2 >= toSize
int capacity = roundUpToPowerOf2(toSize);//该方法很巧妙,返回大于toSize的最小的2的次方,如toSize=17,则返回32。因此HashMap的容量始终是2的次方,考虑下为什么?这里不做讲解,有兴趣的可以查下资料,主要有2点优势,1:减少hash碰撞 2:方便位运算,提高程序性能。
threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
initHashSeedAsNeeded(capacity);
}
/**
* Retrieve object hash code and applies a supplemental hash function to the
* result hash, which defends against poor quality hash functions. This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
* 说明:除了用到k.hashCode()外,还用了异或、移位操作,主要是为了减少Hash碰撞
*/
final int hash(Object k) {
int h = hashSeed;
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
/**
* Returns index for hash code h.
* 说明:h&(length-1),返回该key在table数组的位置。
*/
static int indexFor(int h, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return h & (length-1);
}
/**
* Adds a new entry with the specified key, value and hash code to
* the specified bucket. It is the responsibility of this
* method to resize the table if appropriate.
* Subclass overrides this to alter the behavior of put method.
*/
void addEntry(int hash, K key, V value, int bucketIndex) {
if ((size >= threshold) && (null != table[bucketIndex])) {//当前size超过阀值threshold,则进行resize操作,新table的容量是老table容量的2倍。
resize(2 * table.length);
hash = (null != key) ? hash(key) : 0;
bucketIndex = indexFor(hash, table.length);
}
createEntry(hash, key, value, bucketIndex);
}
/**
* Rehashes the contents of this map into a new array with a
* larger capacity. This method is called automatically when the
* number of keys in this map reaches its threshold.
*
* If current capacity is MAXIMUM_CAPACITY, this method does not
* resize the map, but sets threshold to Integer.MAX_VALUE.
* This has the effect of preventing future calls.
* @param newCapacity the new capacity, MUST be a power of two;
* must be greater than current capacity unless current
* capacity is MAXIMUM_CAPACITY (in which case value
* is irrelevant).
* 说明:这个方法没什么,主要看里面的transfer方法。
*/
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable, initHashSeedAsNeeded(newCapacity));
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
/**
* Transfers all entries from current table to newTable.
* 说明:将当前table的所有entry转移到新table
*/
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {//遍历老table
while(null != e) {//遍历table[i]下的单链表,将下面的entry放入新表,注意,这里会产生线程安全
Entry<K,V> next = e.next;
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
}
}
}
/**
* Like addEntry except that this version is used when creating entries
* as part of Map construction or "pseudo-construction" (cloning,
* deserialization). This version needn't worry about resizing the table.
* Subclass overrides this to alter the behavior of HashMap(Map),
* clone, and readObject.
* 说明:不存在,创建Entry,加入到对应的table[bucketIndex]的链表的头部
*/
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
2:为什么HashMap线程非安全?
-
由于在扩容时多个线程都会进行rehash过程,rehash过程,会导致链表顺序发生错乱,可能生成循环链表。当进行get操作时,遇到循环链表,导致程序异常,cpu升高,(具体细节可以网上查询)。 **如何保证线程安全?** 1:在put操作时,采用synchronized锁。 2:使用ConcurrentHashMap,代替HashMap 3:使用SynchronizedMap。
`