title: 原来你是这样的HashMap tags:
- Java
- JCF
- HashMap
- rehash categories: jcf date: 2017-09-18 19:39:51
Java中HashMap想必是最常用的集合类之一
如下分析均基于Jdk1.7
其中Map是属于JCF中顶级接口 另一个是Collection
Map接口类型如下
有如下几个特点
- size返回int表示最大容量不可能超过Integer.MAX_VALUE 否则无法表示 事实上 hashmap的最大容量为
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
复制代码
也就是其实HashMap是存在最大容量的 那么思考为啥最大容量不是1<< 31呢?
- containsKey以及containsValue,get,remove均使用Object作为参数而不是泛型
what-are-the-reasons-why-map-getobject-key-is-not-fully-generic
我们在使用HashMap的时候一般会调用如下接口
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
复制代码
事实上我们调用的是经验值(通常初始化容量为16(2^4) 负载因子为0.75)
引入了两个新的变量:
-
CAPACITY 容量 表示内部数组的大小 复制代码
-
LOAD_FACTOR 负载因子 表示在给定容量下分配数组的分配概率,通常该参数影响较大 复制代码
比如说负载因子为10 那么可以认为碰撞概率为10 也就是平均每个hash碰撞率在10 因此经验值选择0.75 较为合理
和这两个参数有关的是threshold参数
threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
复制代码
复制代码
该参数表示阈值 意义表示当size>=threshold 需要resize整个HashMap
- 初始化
分析如下代码
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
capacity <<= 1;
this.loadFactor = loadFactor;
threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
useAltHashing = sun.misc.VM.isBooted() &&
(capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
init();
}
复制代码
注意细节 capacity是1左移的结果,也就是初始化容量必然是2的pow(距离传入数字最靠近的不小于指定数字的1的左移) 而不是自己传入的数字。(一般来说素数的冲突较小,为何选择pow(2,n),下文描述)table为存储Entry的数组,也就是我们使用的实体(K-V映射)
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
/**
* Creates new entry.
*/
Entry(int h, K k, V v, Entry<K,V> n) {
value = v;
next = n;
key = k;
hash = h;
}
}
复制代码
其中key为泛型key value为泛型value 并且记录了链表指针next 可窥全图
当然jdk8总当链表长度超过一定长度将自动转化成红黑树
2.当开发者调用put时操作如下
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
复制代码
首先check key是否为空,否则特殊处理为0
/**
* Offloaded version of put for null keys
*/
private V putForNullKey(V value) {
for (Entry<K,V> e = table[0]; e != null; e = e.next) {
if (e.key == null) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(0, null, value, 0);
return null;
}
复制代码
即null key必然放置在table[0],因此需要循环查找该链表 如果该链表中包含key为null则直接替换否则插入对应null key
3.当key不是null时 首先计算key的对应hash
/**
* Retrieve object hash code and applies a supplemental hash function to the
* result hash, which defends against poor quality hash functions. This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
final int hash(Object k) {
int h = 0;
if (useAltHashing) {
if (k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h = hashSeed;
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
复制代码
hash函数尽量得出均匀的hash值。因此使用了多次循环右移(Java8进行了改造)
- 根据hash找到指定的在table的位置
/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
return h & (length-1);
}
复制代码
这边解释了为何使用pow(2,n)作为table的length。如果常规做法通常就是mod。但是基于框架级别的选择除法的效率和与操作的效率相比较差。pow(2,n)-1 可以得出比如0111, 01111,011111等等
此时做与操作可以将hash值的末尾n位的值拿出来。因此对于hash的要求必须生成的hash在末端不要重复。相当于会抹去32-n的前位。 而如果不是2的倍数的情况下可能无法获得更多的信息来作为hash分配
- 当在对应的hash路径下如果可以找到指定的Key那么直接覆盖替换(由此要求hashcode和equals两个方法在覆盖重写必须一起重写,否则很容易出现纰漏)
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
复制代码
6. 如果对应的key不存在的情况下
/**
* Adds a new entry with the specified key, value and hash code to
* the specified bucket. It is the responsibility of this
* method to resize the table if appropriate.
*
* Subclass overrides this to alter the behavior of put method.
*/
void addEntry(int hash, K key, V value, int bucketIndex) {
if ((size >= threshold) && (null != table[bucketIndex])) {
resize(2 * table.length);
hash = (null != key) ? hash(key) : 0;
bucketIndex = indexFor(hash, table.length);
}
createEntry(hash, key, value, bucketIndex);
}
复制代码
检测当前size是否比阈值大,如果是则需要扩容。每次扩容均是前面的容量的2倍,此时需要rehash操作 每次rehash其实由于长度变为2倍所以对于只有低位的hashcode可能并不会出现rehash操作(jdk8中做了优化)
/**
* Rehashes the contents of this map into a new array with a
* larger capacity. This method is called automatically when the
* number of keys in this map reaches its threshold.
*
* If current capacity is MAXIMUM_CAPACITY, this method does not
* resize the map, but sets threshold to Integer.MAX_VALUE.
* This has the effect of preventing future calls.
*
* @param newCapacity the new capacity, MUST be a power of two;
* must be greater than current capacity unless current
* capacity is MAXIMUM_CAPACITY (in which case value
* is irrelevant).
*/
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
boolean oldAltHashing = useAltHashing;
useAltHashing |= sun.misc.VM.isBooted() &&
(newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
boolean rehash = oldAltHashing ^ useAltHashing;
transfer(newTable, rehash);
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null != e) {
Entry<K,V> next = e.next;
if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
}
}
}
复制代码
7. 容量为最大容量时,此时不再扩充。同时将阈值设置为最大值Integer.MAX_VALUE
当容量未达到最大容量时,此时需要将老的数据全部放到新的数组中(相当耗时)因此一个合理的负载因子和初始化容量很有必要(试想当一个大的hashmap 重头开始扩容需要多少次,比如size为100000 10000<2*2*2*2*2*2*2*2*2*2*2*2*2*2*0.75 )
当然由于链表重新transfer,其顺序也发生了倒置
8. 根据计算的hash以及算出的对应的index直接 将原先数组对应的对象作为next指针即可
/**
* Like addEntry except that this version is used when creating entries
* as part of Map construction or "pseudo-construction" (cloning,
* deserialization). This version needn't worry about resizing the table.
*
* Subclass overrides this to alter the behavior of HashMap(Map),
* clone, and readObject.
*/
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
复制代码
由于通篇均没有使用锁,因此HashMap不是线程安全的,如果作为共享对象很容易出现各种各样的问题。
小测试如下
@Test
public void testA() {
A a = new A(10);
Map<A, Object> map = new HashMap<>();
map.put(a, a.getA());
a.setA(100);
System.out.println(map.get(a));
map.put(a, a.getA());
System.out.println(map.size());
}
class A {
public A(int a) {
this.a = a;
}
private int a;
public int getA() {
return a;
}
public void setA(int a) {
this.a = a;
}
@Override
public int hashCode() {
return a;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof A)) return false;
A a1 = (A) o;
return a == a1.a;
}
}
复制代码
结果是啥?