一、HashMap介绍的Javadoc翻译
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
基于map接口实现的哈希表,实现了所有map操作,并且允许null值和null键,除了
HashMap不是线程安全和允许null之外,它基本等同于HashTable。该类不对map的排序
做保证,特别是在一段时间之后更不能保证顺序不变。
*
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
* 假如哈希函数可以很好的将元素散列在哈希槽中,该实现对get和put操作将提供固定时
间的访问性能。遍历集合所需时间与hashmap对象的槽个数加上键值对个数成正比。所以
,不要设置过高的初始容量或者太低的加载因子对于遍历性能很重要。
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
一个hashmap实例有两个影响性能的重要参数:初始容量和加载因子。容量是指哈希表的
槽个数,初始容量就是当哈希表创建时的容量。加载因子是表征哈希表的满载程度,就是
在哈希表增加容量之前还可以允许放多少键值对。当哈希表中的项超过了加载因子与当前
容量的乘积,哈希表将被重建(也就是说数据结构重建),大小大约是原来的两倍。
*
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
通常来说,缺省的加载因子(0.75)提供了空间与时间的平衡。高于此将降低空间
开销但是增加查询开销(反映在大多数haspmap的操作上,包括了get和put操作)。
预期map中存储的键值对和加载因子应该在设置初始容量时就要考虑到,以至于最小
减少rehash的次数。如果初始容量大于最大存储的键值对个数与加载因子的乘积,
重新hash将永远不会发生。
*
* <p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
如果hashmap将存储许多键值对,创建一个合适容量hashmap将比按需重哈希增加哈希表
更高效。注意,当许多键具有相同hashcode时,很确定的是这将降低哈希表的性能。
因此,为了降低影响,对于实现comparable的key,hashmap将使用比较排序以打破这种
影响。
*
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
*
注意,该类的实现不是同步的。如果多线程访问一个hashmap,并且至少一个线程修改了
map的结构,必须需要额外的同步手段。(结构上的修改是指添加删除一个或多个键值对;
仅仅改变已存在的键对应的值不是结构性修改。)通常这将通过对某个封装了这个map的
对象来进行同步以达到同步的目的。
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>
如果不存在这么个对象,map则需要使用Collections.synchronizedMap方法封装。这是
创建时最好的做法,用来防止意外的非同步访问。
Map m = Collections.synchronizedMap(new HashMap(...))
*
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
*
该类的所有的访问集合类的方法所返回的迭代器都是快速失败类型的,是指:除非通过迭
代器自己的remove方法,如果迭代器创建后的任意时间出现了对map的结构性修改,都将抛
出ConcurrentModificationException。因此,对于并发修改,迭代器快速利落的失败而不
是在未来随意冒险和某个不确定时间产生不确定的行为。
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
*
注意,迭代器的快速失败行为并不能保证跟它描述的那样,一般来说,非同步的并发修改不可能
得到任何的保证。快速失败迭代器抛出的ConcurrentModificationException是基于最大努力。因
此,依赖这个异常来写程序是错误的,迭代器快速失败行为应该仅被用来检测缺陷。
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Doug Lea
* @author Josh Bloch
* @author Arthur van Hoff
* @author Neal Gafter
* @see Object#hashCode()
* @see Collection
* @see Map
* @see TreeMap
* @see Hashtable
* @since 1.2
*/
二、HashMap的put源码
HashMap对比HashTable,首先插入的key的哈希计算方法更优,提前将高16位纳入哈希计算;HashMap在链表阶段采用的是尾插法,而HashTable采用的是头插法,HashMap在达到一定阈值之后,将链表转变为红黑树,保证最差查询,修改,删除的时间复杂度为O(lgN)
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
整数最大是32位,这段话大概意思是把数据的高16位与低16位进行异或,因为本身高位对于散列的影响比较小,
因为哈希表的长度本身就最大到2的30次方,在很长一段时间才会扩展到这么长的一个哈希表,为了尽可能
散列并尽早将高16位对哈希值的影响提前尽可能的散列开所以进行了高16位和低16位的异或。同时3个>的右移
表示的是带符号位右移。
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
/**
保存键值
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
//初始化表格,并且容量初始值16
n = (tab = resize()).length;
//查找是否当前位置的hash已经存在节点,因为是下标所以减1,如果不存在则构建节点
if ((p = tab[i = (n - 1) & hash]) == null)
//创建节点,并设置key和value以及下一个元素为null,数组索引赋值
tab[i] = newNode(hash, key, value, null);
else {
//如果hash计算的同一位置则进行哈希再检测
Node<K,V> e; K k;
//如果hash一致并且键一致记录下当前节点为要put的键值对节点
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
//判断当前是不是树型节点
else if (p instanceof TreeNode)
//是的话则继续存放树型节点
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
//键的hash值不一样与当前存放的键不一样。则准备开始进行下一步操作
for (int binCount = 0; ; ++binCount) {
//一直查询到链表的尾部,这段代码就是链表的遍历过程
if ((e = p.next) == null) {
//创建新的节点,并将当前遍历的下一节点指向当前节点
p.next = newNode(hash, key, value, null);
//如果链表的长度大于树的深度则进行树型转换,默认链表长度是8
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
//树型变换
treeifyBin(tab, hash);
break;
}
//发现相同key的元素
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
//继续遍历
p = e;
}
}
//如果当前的key存在于当前的哈希表中,重新赋值
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
//只是改变值的话直接返回,下面发生结构化次数就不会加1了
return oldValue;
}
}
//发生结构花改变的次数
++modCount;
//大于阈值之后rebuild哈希表
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
/**
* Replaces all linked nodes in bin at index for given hash unless
* table is too small, in which case resizes instead.
根据当前的hash值,对哈希表进行树型变换
*/
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
//如果哈希表的长度小于最小树型变化的容量则需要进行重新变换大小,而不是直接进行树型变化,
//这个最小树型容量变化的值是4倍的链表长度阈值,以防止树型变换和表格长度变换之间的冲突
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
else if ((e = tab[index = (n - 1) & hash]) != null) {
TreeNode<K,V> hd = null, tl = null;
do {
//将当前节点转为树型节点
TreeNode<K,V> p = replacementTreeNode(e, null);
//第一个节点时是null
if (tl == null)
//头节点是第一个遍历的链表节点
hd = p;
else {
//这里的目的是构建一个双向链表
p.prev = tl;
tl.next = p;
}
tl = p;
} while ((e = e.next) != null);
if ((tab[index] = hd) != null)
//执行树型变换
hd.treeify(tab);
}
}
/**
* Forms tree of the nodes linked from this node.
链接到该节点的所有节点进行树型变换
* @return root of tree
*/
final void treeify(Node<K,V>[] tab) {
TreeNode<K,V> root = null;
//正向遍历链表
for (TreeNode<K,V> x = this, next; x != null; x = next) {
//获取下一节点
next = (TreeNode<K,V>)x.next;
//左右子树置空
x.left = x.right = null;
//根为null,说明当前遍历第一个节点为根节点
if (root == null) {
//根节点无父节点
x.parent = null;
//根节点只能是黑色
x.red = false;
//设置根节点
root = x;
}
else {
K k = x.key;
int h = x.hash;
Class<?> kc = null;
//根节点开始比较当前节点
for (TreeNode<K,V> p = root;;) {
//ph是p的hash
int dir, ph;
K pk = p.key;
//如果ph的哈希大于h,说明是要向左子树方向遍历插入
if ((ph = p.hash) > h)
dir = -1;
else if (ph < h)
dir = 1;
else if ((kc == null &&
(kc = comparableClassFor(k)) == null) ||
(dir = compareComparables(kc, k, pk)) == 0)
dir = tieBreakOrder(k, pk);
TreeNode<K,V> xp = p;
if ((p = (dir <= 0) ? p.left : p.right) == null) {
x.parent = xp;
if (dir <= 0)
xp.left = x;
else
xp.right = x;
//以上就是二叉查找树,又称二叉排序树,也称二叉搜索树的构建,接下来就是树的平衡插入,因为插入节点会打破平衡,
//下面这个是重头了,红黑树平衡算法
//传入的参数是根与当前节点,返回节点为新根节点
root = balanceInsertion(root, x);
break;
}
}
}
}
moveRootToFront(tab, root);
}
//向左旋转
static <K,V> TreeNode<K,V> rotateLeft(TreeNode<K,V> root,
TreeNode<K,V> p) {
TreeNode<K,V> r, pp, rl;
//p不为NULL,并且p的右子树也不为NULL,否则不需要向左旋转
if (p != null && (r = p.right) != null) {
//首先将要旋转的p节点的右子树的左子树赋值给p的右子树,这就相当于把p节点的右孩子的左孩子提出来,为p的右孩子上提做准备
if ((rl = p.right = r.left) != null)
//除了将p的右孩子修改指向为右还在的左还在之外,还需要指定rl的父节点为p节点
rl.parent = p;
//右孩子上提,还需要将右孩子的父亲修改为p的父亲,如果r的父节点为null,则直接确认就是根节点,直接返回,并设置根节点为黑色
if ((pp = r.parent = p.parent) == null)
(root = r).red = false;
//如果p的父亲指定的左孩子为当前p节点,则将父节点的左孩子直接指定为p节点的右孩子
else if (pp.left == p)
pp.left = r;
//如果当前p节点的右孩子为当前p节点,则将父节点的右孩子指定为原来p节点的右孩子
else
pp.right = r;
//最终p节点成为它右孩子的左节点,p节点的父亲成为原来它右孩子的左节点,上面一系列步骤就是为了交换父亲和左右子树
r.left = p;
p.parent = r;
}
return root;
}
//右旋转
static <K,V> TreeNode<K,V> rotateRight(TreeNode<K,V> root,
TreeNode<K,V> p) {
TreeNode<K,V> l, pp, lr;
//p不为NULL,并且p的左子树不为NULL,否则不需要向右旋转
if (p != null && (l = p.left) != null) {
//将l的右子树拆下来,放到即将被旋转的p节点的左子树
if ((lr = p.left = l.right) != null)
//原来l的右子树的父节点指向p节点
lr.parent = p;
//将l的父节点链接到要被旋转的p节点的父节点上
if ((pp = l.parent = p.parent) == null)
//如果父节点为NULL则直接原来旋转节点的左孩子就是根节点,并且颜色为黑色
(root = l).red = false;
//否则将如果被旋转节点是父节点的右孩子,则父节点的有孩子就是要旋转的p节点的左孩子
else if (pp.right == p)
pp.right = l;
//如果被旋转节点是父节点的左孩子,则父节点的左孩子就是要旋转的p节点的左孩子
else
pp.left = l;
//被旋转的p节点下沉成为其原来左孩子的右孩子
l.right = p;
//被旋转的p节点的父节点就是原来的左孩子
p.parent = l;
}
return root;
}
/**
(1)每个结点要么是红的要么是黑的。
(2)根结点是黑的。
(3)每个叶结点(叶结点即指树尾端NIL指针或NULL结点)都是黑的。
(4)如果一个结点是红的,那么它的两个儿子都是黑的。
(5)对于任意结点而言,其到叶结点树尾端NIL指针的每条路径都包含相同数目的黑结点。**/
static <K,V> TreeNode<K,V> balanceInsertion(TreeNode<K,V> root,
TreeNode<K,V> x) {
//一进来就设置当前插入的节点时红色的
x.red = true;
//定义了四个变量的含义从左至右依次为:x的父节点,x的祖父节点,x的祖父节点的左孩子(x的叔叔节点),x的祖父节点的右孩子(x的叔叔节点)
for (TreeNode<K,V> xp, xpp, xppl, xppr;;) {
//节点x的父节点xp为null,则当前节点就是根节点,返回当前节点
if ((xp = x.parent) == null) {
x.red = false;
return x;
}
//当前节点的父节点是黑色,加入一个红色的节点不会影响红黑树的结构,所以可以直接返回
//或者祖父节点是null,表示是只有两级也可以直接插入,因为根节点是黑色的,插入的是红色的也不会影响红黑树的结构
else if (!xp.red || (xpp = xp.parent) == null)
return root;
//x节点添加到根的左子树,经过以上条件过滤完了之后,插入节点是红色的,父节点是红色的,并且祖父节点也是存在的
//左子树进行变换,转成一个红黑树
if (xp == (xppl = xpp.left)) {
//1、x的叔节点不为null,并且叔叔节点时红色的,因为父节点也是红色的
if ((xppr = xpp.right) != null && xppr.red) {
//不满足性质(4),此时将x节点的父节点和叔叔节点都设置为黑色,再把祖父节点设置为红色
xppr.red = false;
xp.red = false;
xpp.red = true;
//此时修改插入节点x变为祖父节点,继续循环进行调整
x = xpp;
}
//2、如果叔叔节点是null或者是黑色的,但是父亲节点是红色的
else {
//x节点为右节点
if (x == xp.right) {
//以x的父节点进行向左旋转,同时x变为原来的父节点
root = rotateLeft(root, x = xp);
//左旋转完成之后,原来的xp旋转成为原来x的左孩子,所以插入节点x这时候又变成了xp就是原来的父节点
xpp = (xp = x.parent) == null ? null : xp.parent;
}
//x节点为左节点
if (xp != null) {
//将x的父节点设置为黑色
xp.red = false;
if (xpp != null) {
//同时将x的祖父节点设置为红色
xpp.red = true;
//以祖父节点向右旋转,直至旋转完
root = rotateRight(root, xpp);
}
}
}
}
else {
//镜像旋转的
if (xppl != null && xppl.red) {
xppl.red = false;
xp.red = false;
xpp.red = true;
x = xpp;
}
else {
if (x == xp.left) {
root = rotateRight(root, x = xp);
xpp = (xp = x.parent) == null ? null : xp.parent;
}
if (xp != null) {
xp.red = false;
if (xpp != null) {
xpp.red = true;
root = rotateLeft(root, xpp);
}
}
}
}
}
}
//初始化或者设置容量为原来的2倍
final Node<K,V>[] resize() {
//旧表的引用保存为oldTab
Node<K,V>[] oldTab = table;
//初始化时为null,所以oldcapacity=0
int oldCap = (oldTab == null) ? 0 : oldTab.length;
//旧的阈值保存
int oldThr = threshold;
//新的容量和新的阈值初始为0
int newCap, newThr = 0;
if (oldCap > 0) {
//如果大于最大容量2^30
if (oldCap >= MAXIMUM_CAPACITY) {
//阈值设置为整数最大值,并返回旧表
threshold = Integer.MAX_VALUE;
return oldTab;
}
//如果新的容量增大两倍后小于最大容量,并且旧的容量大于缺省的初始容量,则新的阈值增大两倍
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
//旧的容量大于0表示初始化过,并且旧的阈值也大于0,则将新的容量设置为旧的阈值
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults,否则表示的是初始化map
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);//这个比较重要初始化HashMap如果不指定负载因子的话就会默认使用0.75,以后阈值就是基于0.75进行扩展
}
//新的阈值为0,则再次计算新的阈值
if (newThr == 0) {//使用HashMap(int,faloat)初始化时就会使用这个进行阈值设置
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
//赋值本map的阈值
threshold = newThr;
//创建Node数组
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
//赋值给本map的table
table = newTab;
//如果不是初始化表而是扩容
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
//为空则执行移位就可以了,因为是2的幂
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
//当前节点时树型节点,则进行分裂
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
//该拆分链表是保留原始顺序的
//怎么拆分的呢?
//低位的头
Node<K,V> loHead = null, loTail = null;
//高位的头
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
//与原来容量的大小进行与计算可知是放在低位的,注意这里与保存时不一样,保存是容量减1,移位时用的是容量
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
//低位继续存放在原始位置
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
//高位的进行初始容量次位移
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
红黑树平衡算法要代码结合原理去分析,jdk源码的实现比较巧妙,算法导论中的理论分析可以参考如下的分析
https://www.cnblogs.com/Anker/archive/2013/01/30/2882773.html