我们都知道java集合中有两个重要的对象HashSet和HashMap,为什么处于这么重要的位置呢,首先set集合中我们存放的是一个没有重复对象的集合,这给我们编程提供了非常方便的操作,我们不用担心set集合中会有两个重复的对象,但是也会有缺点,我们遍历会存在一定麻烦;然后就是我们的map,我们的map存放的key-value的形式了,跟我们对象中的属性和属性值类似的东西,那不知道大多数朋友们知道其中的原理了吗?我们今天通过源码来分析下我们的set和map集合。
既然使用,我们还是从初始化开始(Set set = new HashSet()),然后add(new Object());
//HashSet中维护着一个map??
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element <tt>e</tt> to this set if
* this set contains no element <tt>e2</tt> such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns <tt>false</tt>.
*
* @param e element to be added to this set
* @return <tt>true</tt> if this set did not already contain the specified
* element
*/
//add方法也是调用的map对象中的put方法(e作为key,PRESENT = new Object()作为value)
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
我们可以看到我们初始化Set时会同时初始化了map,我们add添加对象时,调用的map对象中的put方法,因为我们的map是key为唯一的key-value形式,所以我们的hashSet就是依照这个关系来保证了对象唯一的。所以我们重点去看下我们的hashMap的put方法如何保证了key的唯一性。
下面是map初始化时初始化的部分代码,除了赋予loadFactor值外,其他都为默认、table为null、entySet为null等
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
//map维护的Node数组
transient Node<K,V>[] table;
/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
//map存放的key对应的set集合
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key-value mappings contained in this map.
*/
//map集合的大小
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
//修改此map的次数
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
//初始化时,只初始化了loadFactor,其他都为默认
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
初始化之后是我们的put方法。拿我们的set.add()方法去看,我们重点看我们key是如何保证唯一的。
/**
* Adds the specified element to this set if it is not already present.
* More formally, adds the specified element <tt>e</tt> to this set if
* this set contains no element <tt>e2</tt> such that
* <tt>(e==null ? e2==null : e.equals(e2))</tt>.
* If this set already contains the element, the call leaves the set
* unchanged and returns <tt>false</tt>.
*
* @param e element to be added to this set
* @return <tt>true</tt> if this set did not already contain the specified
* element
*/
//set中的add方法,很明显我们可以看到调用的是map.put方法
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
//=====上面是set中的代码,下面是map中的方法,贴到一块了============
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*/
//map插入键值对执行的方法
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
//调用key.hashCode()方法 并且于该值无符号右移16位 异或取值
/**
*曾经我一度想知道这个值是多少,但是发现就算知道也没什么用
*我们只需要知道同一个对象调用这个方法之后,返回的int值是一样的就够了
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
/**
* Implements Map.put and related methods
*
* @param hash hash for key
* @param key the key
* @param value the value to put
* @param onlyIfAbsent if true, don't change existing value
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none
*/
//这是我们要看的重点方法了
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
在分析putVal方法之前,我先去网上找了一张map的结构图,这样更清晰一点:
可以看到我们的map的结构是数组和单链表共同组合成的一种结构,数组也就是我们的table,然而我们的table的类型是Node类型的节点
Node结构:我们可以清楚的看到其中存放的有一个hash,key,value还有一个关键的next指向链接起来的后面的节点(从而构成了单链表)
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
//这个是我们的Node节点的结构,我们可以清楚的看到next指向的就是下个节点,从而构成单链表
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return value; }
public final String toString() { return key + "=" + value; }
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
我们现在逐句来看下我们的putVal方法都进行了什么操作:
首次插入初始化table数组
//声明变量没什么看的
Node<K,V>[] tab; Node<K,V> p; int n, i;
//首先判断该table数组是否为null,首次插入key-value时table为null
if ((tab = table) == null || (n = tab.length) == 0)
//初始化我们的table,默认长度为16,talbe = (Node<K,V>[])new Node[16]
//感兴趣的朋友可以去看下resize方法详细步骤
n = (tab = resize()).length;
判断table数组对应位置是否为null,如果为null,插入该位置对应单链表的首个值
我们的n为16,一个int的hash值与(16 - 1)进行&操作不管是什么值,我们只需要知道肯定是个不大于15的值就行,为什么是15?(因为我们刚才初始化table数组时长度为16(下标0-15),所以我们保证不超过这个值就行,不是很明白的可以回过头看map的结构图)
举例:比如我们插入的key的hash为0000 1000,这个数与15进行&操作
tab[8] == null?我们第一次插入值肯定为null啊,毋庸置疑,此时我们new了一个next->null的Node节点赋给了tab[8],有人说不是从下标0开始的吗?map没有规定一定要从下标为0的开始,只要是0-15随意
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
执行完if语句之后,此时我们的table就有键值对了,tab在下标为8的位置有一个只有一个key-value(我们拿set做例子,自然key就是我们的key,但是value只是一个new Object而已,下面的例子都是这样的key-value不做解释了)节点的单链表
第二次插入key-value
我们接着看我们的else语句,假设我们第二次插入的key-value,并且同样也插入table下标为8位置:
else {
Node<K,V> e; K k;
//我们从if条件中可以看到p节点即为table下标8位置的节点(也就是我们首次插入的节点Node)
//我们可以看到这个判断条件是去比较将要插入的key于p节点(也就是该位置的单链表的第一个节点)的key的hash是否相同
//如果相同则将p赋予e,否则看我们的else
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
//这个条件可以忽略,有兴趣的朋友可以研究下TreeNode和Node的区别
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
//我们直接来看如果key的hash不相同的情况
else {
//我们可以看到此循环的目的是拿到node.next->null的节点(也就是talbe下标为8的单链表的最后一个节点)
for (int binCount = 0; ; ++binCount) {
//如果循环结束直接链接在最后
if ((e = p.next) == null) {
//找到node.next -> null的节点是将next-> 新够成的节点newNode
//构成单链表
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
//否则就判断此节点于目标节点(new节点)key是否相同(hash是否相同)
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
总结一下上面的代码:
首先我们先判断此单链表的第一个节点是否与目标节点(新节点)的key是同对象(也就是hash是否相同),如果相同拿出此节点赋予e变量,如果不相同,则循环单链表,如果有key相同的拿出相同的节点赋予e变量,如果循环结束都没有则链接在此链表的最后。
也就是如果此链表有key相同的将链表中已有的节点拿出来,没有key相同的链接在此链表最后
拿到key对象相同的node节点,次节点可能为null(不存在key相同的node),替换老value值
//这个e变量也就是上面拿到的重复key的node,如果没有则为null
//如果e不为null,新的value值将覆盖oldvalue,最终返回
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
上面代码就是如果e不为null,说明存在key相同的键,新value替换老value并返回
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
最后这些就没什么重要的了,如果没有重复的key,则返回null。