HashMap的底层源码
HashMap算是我被我当成对象最多的Map了,用的多了就会产生好奇,到底它底层是怎么样的,今天我来一睹芳容。
代码是在是太多了,不过没关系,看中文就行
```java
package java.util;
import java.io.IOException;
import java.io.InvalidObjectException;
import java.io.Serializable;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;
import java.util.function.BiConsumer;
import java.util.function.BiFunction;
import java.util.function.Consumer;
import java.util.function.Function;
/**
* Hash table based implementation of the <tt>Map</tt> interface. This
* implementation provides all of the optional map operations, and permits
* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>
* class is roughly equivalent to <tt>Hashtable</tt>, except that it is
* unsynchronized and permits nulls.) This class makes no guarantees as to
* the order of the map; in particular, it does not guarantee that the order
* will remain constant over time.
* 这一段内容主要是说:HashMap实现了Map接口,并能执行所有Map接口的可执行操作(废话),HashMap允许
* null作为值和nmull作为键。HashMap除了①允许null键值②线程不安全这两个特点外,基本是可以和HashTable等价。
* HashMap不保证map的顺序,尤其是不保证数据持久不变。
* <p>This implementation provides constant-time performance for the basic
* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function
* disperses the elements properly among the buckets. Iteration over
* collection views requires time proportional to the "capacity" of the
* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number
* of key-value mappings). Thus, it's very important not to set the initial
* capacity too high (or the load factor too low) if iteration performance is
* important.
* 假定hash函数把元素适当地分部在buckets中,那么该实现为基本的操作(如get和put)提供了恒定的性能。
* 迭代集合的视图所需时间与HashMap实例的的buckets数量和HashMap的容量(也就是键值对映射的数量)息息相关
* 因此,如果想要获得高效的迭代效率,不要设置太高的初始容量(或太低的负载因子)。这里很多人都只是使用了
* HashMap的无参构造,其实HashMap还有有参构造的
* *********************************************
* HashMap(int initialCapacity, float loadFactor)
* HashMap( int initialCapacity)
* HashMap( <no parameters> )
* HashMap(aNotNull Map<? extends String, ?> m)
* 也就是说不要把initialCapacity设置太高和把loadFactor设置太低
* *********************************************
* <p>An instance of <tt>HashMap</tt> has two parameters that affect its
* performance: <i>initial capacity</i> and <i>load factor</i>. The
* <i>capacity</i> is the number of buckets in the hash table, and the initial
* capacity is simply the capacity at the time the hash table is created. The
* <i>load factor</i> is a measure of how full the hash table is allowed to
* get before its capacity is automatically increased. When the number of
* entries in the hash table exceeds the product of the load factor and the
* current capacity, the hash table is <i>rehashed</i> (that is, internal data
* structures are rebuilt) so that the hash table has approximately twice the
* number of buckets.
* *********************************************
* 每个HashMap有两个影响它性能的参数:initial capacity和load factor。capacity就是hash表中的
* buckets的数量,并且仅仅是创建时的容量;load factor 是用来衡量hash表中允许hashMap在容量自动增长前的
* 饱和程度(也就是相当于饱和度的意思。比如我设置了初始容量为10,饱和度为0.9,那么这个hashMap实例在数量超过9
* 之后才会进行自动扩容(rehashd):也就是说,内部的数据结构被重构,所以hash表的buckets接近是原来的两倍了)。
* 从这里再去看上面那句话"不要设置太高的初始容量和太低的饱和因子",如果我是new HashMap<String>(10,0.2),
* 那么我只要存三个值之后,该HashMap实例就扩容和重构,实例的体积就加倍了,尤其影响效率。
* *********************************************
* <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs. Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>). The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations. If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
* *********************************************
* 一般来说,初始默认的负载因子(0.75)就是一般使用中对于时间和容量较为合适的权重了,
* 高空间就意味着高查找时的高消耗(在包括get和put等的HashMap类中的操作中都可感受到)
* 当初始化HashMap的时候,预期的容量和负载因子都需要列入考虑范围,借此来减少rehashed操作。
* 如果初始容量(默认是16)大于最大条目数(默认是2的30次方)/(除以)负载因子,
* 那么rehashed将永远不会发生。(也就是说如果初始化的数量大于2的30次方/0.75),那么该hashMap
* 不会再重新扩容。
* *********************************************
* <p>If many mappings are to be stored in a <tt>HashMap</tt>
* instance, creating it with a sufficiently large capacity will allow
* the mappings to be stored more efficiently than letting it perform
* automatic rehashing as needed to grow the table. Note that using
* many keys with the same {@code hashCode()} is a sure way to slow
* down performance of any hash table. To ameliorate impact, when keys
* are {@link Comparable}, this class may use comparison order among
* keys to help break ties.
* *********************************************
* 建议:如果有大量的数据要存储在HashMap的实例中,那么最好直接创建大容量满足使用的实例,这样做会比让实例自己扩容和
* rehashed要有更高的效率。使用多个具有相同hashCode(hash码或称hash值)的键无疑会降低该hash表的性能。
* 为了改善这个问题,当键是可比较的时候(int类型的数据是无法使用Comparator的compareTo方法的),使用比较厚的顺序可以
* 大大改善这个问题。意思是说使用具有一定顺序的键能增强性能。
* *********************************************
* <p><strong>Note that this implementation is not synchronized.</strong>
* If multiple threads access a hash map concurrently, and at least one of
* the threads modifies the map structurally, it <i>must</i> be
* synchronized externally. (A structural modification is any operation
* that adds or deletes one or more mappings; merely changing the value
* associated with a key that an instance already contains is not a
* structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the map.
* *********************************************
* 需要明确的是,该实现(也就是指HashMap)不是同步的(也就是说是线程不安全的),如果多线程同时访问
* 同一个hashMap,且至少有一个线程在试图做修改操作,那就必须加锁。(结构性该表指的任何新增或删除操作,
* 而仅仅改变某个键对应的值不算)。对封装就map的对象上锁本就是一个好的习惯。
* *********************************************
* If no such object exists, the map should be "wrapped" using the
* {@link Collections#synchronizedMap Collections.synchronizedMap}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the map:<pre>
* Map m = Collections.synchronizedMap(new HashMap(...));</pre>
* *********************************************
* 当HashMap中没有对象存在时,最好在创建HashMap是使用Collection.syncchronizedMap
* 方法创建,
* Map<String, Integer> synchronizedMap = Collections.synchronizedMap(new HashMap<String, Integer>());
* 这样做可以阻止意外非同步访问map。意思是说,如果map是新的map,最好是用 Collections.synchronizedMap来创建。
* *********************************************
* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}. Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
* *********************************************
* HashMap在迭代器做迭代的时候,除了自带的remove方法,任何结构性的改变(结构性改变上边有说道)都会时迭代操作
* 立马失败,而且会抛出“ConcurrentModificationException”,并发修改异常。就是迭代的时候不能同时做修改操作,
* 自带的remove是可以的。因此,并发修改将会立马报错,而不是将该隐患携带着继续进行迭代。
* *********************************************
* <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
* as it is, generally speaking, impossible to make any hard guarantees in the
* presence of unsynchronized concurrent modification. Fail-fast iterators
* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
* Therefore, it would be wrong to write a program that depended on this
* exception for its correctness: <i>the fail-fast behavior of iterators
* should be used only to detect bugs.</i>
* *********************************************
* 迭代的快速失败是无法保证的,也就是说无法保证不存在非同步并发修改,快速失败跌打抛出ConcurrentModificationException
* 异常只是已经是做的最大的努力了。因此,想基于抛出快速失败跌打抛出ConcurrentModificationException
* 异常来来进行编程判断正确性显然是不可靠的,这个快速失败的机制只应该用于发现代码中的bug。
* *********************************************
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @param <K> the type of keys maintained by this map
* @param <V> the type of mapped values
*
* @author Doug Lea
* @author Josh Bloch
* @author Arthur van Hoff
* @author Neal Gafter
* @see Object#hashCode()
* @see Collection
* @see Map
* @see TreeMap
* @see Hashtable
* @since 1.2
*/
public class HashMap<K,V> extends AbstractMap<K,V>
implements Map<K,V>, Cloneable, Serializable {
private static final long serialVersionUID = 362498820763181265L;
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
* *********************************************
* HashMap是基于hash表,但是当这个容器过大的时候,每个节点就会和java.util.TreeMap
* 的节点相似,大多数方法都视图使用普通的容器,但是转化成TreeNode也是可行的(只需简单地检查
* 几点的实例)。
* *********************************************
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
* *********************************************
* 数容器是根据hash码进行排序,但是如果hash值重复了,如果两个元素是同一个类而且
* 可以实现Comparable接口,那么将会使用compareTo方法进行排序。保守地通过反射来
* 检查泛型(参见“comparableClassFor”方法)
* *********************************************
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* The root of a tree bin is normally its first node. However,
* sometimes (currently only upon Iterator.remove), the root might
* be elsewhere, but can be recovered following parent links
* (method TreeNode.root()).
*
* All applicable internal methods accept a hash code as an
* argument (as normally supplied from a public method), allowing
* them to call each other without recomputing user hashCodes.
* Most internal methods also accept a "tab" argument, that is
* normally the current table, but may be a new or old one when
* resizing or converting.
*
* When bin lists are treeified, split, or untreeified, we keep
* them in the same relative access/traversal order (i.e., field
* Node.next) to better preserve locality, and to slightly
* simplify handling of splits and traversals that invoke
* iterator.remove. When using comparators on insertion, to keep a
* total ordering (or as close as is required here) across
* rebalancings, we compare classes and identityHashCodes as
* tie-breakers.
*
* The use and transitions among plain vs tree modes is
* complicated by the existence of subclass LinkedHashMap. See
* below for hook methods defined to be invoked upon insertion,
* removal and access that allow LinkedHashMap internals to
* otherwise remain independent of these mechanics. (This also
* requires that a map instance be passed to some utility methods
* that may create new nodes.)
*
* The concurrent-programming-like SSA-based coding style helps
* avoid aliasing errors amid all of the twisty pointer operations.
*/
/**
* The default initial capacity - MUST be a power of two.
* 默认的初始容量,必须是2的幂
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* ----------------------------------------------------------------------
* 这里就是hashMap的默认初始容量。1<<4就是移位运算,0000 0001 → 0001 0000
* 0000 0001向左移位4就变成16.可以理解为1*2^4
* ----------------------------------------------------------------------
*/
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* ----------------------------------------------------------------------
* 最大的容量2的30次方
* ----------------------------------------------------------------------
*/
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* ----------------------------------------------------------------------
* 默认的负载因子0.75
* ----------------------------------------------------------------------
*/
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return value; }
public final String toString() { return key + "=" + value; }
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
/* ---------------- Static utilities -------------- */
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
/**
* ----------------------------------------------------------------------
* 当键位null时,hash方法返回0,否则就会返回一个新的hash值。也就是说,允许一个null
* 作为键,多个null的话从第二个null开始,将会指向同一个地方,也就是值被覆盖
* ----------------------------------------------------------------------
*/
/**
/**
* Returns x's Class if it is of the form "class C implements
* Comparable<C>", else null.
*/
static Class<?> comparableClassFor(Object x) {
if (x instanceof Comparable) {
Class<?> c; Type[] ts, as; Type t; ParameterizedType p;
if ((c = x.getClass()) == String.class) // bypass checks
return c;
if ((ts = c.getGenericInterfaces()) != null) {
for (int i = 0; i < ts.length; ++i) {
if (((t = ts[i]) instanceof ParameterizedType) &&
((p = (ParameterizedType)t).getRawType() ==
Comparable.class) &&
(as = p.getActualTypeArguments()) != null &&
as.length == 1 && as[0] == c) // type arg is c
return c;
}
}
}
return null;
}
/**
* ----------------------------------------------------------------------
* 解决hash碰撞,使用比较后的顺序解决
* ----------------------------------------------------------------------
*/
/**
/**
* Returns k.compareTo(x) if x matches kc (k's screened comparable
* class), else 0.
*/
@SuppressWarnings({"rawtypes","unchecked"}) // for cast to Comparable
static int compareComparables(Class<?> kc, Object k, Object x) {
return (x == null || x.getClass() != kc ? 0 :
((Comparable)k).compareTo(x));
}
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
/* ---------------- Fields -------------- */
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;
/* ---------------- Public operations -------------- */
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
*/
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
/**
* Constructs an empty <tt>HashMap</tt> with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
/**
* Constructs an empty <tt>HashMap</tt> with the default initial capacity
* (16) and the default load factor (0.75).
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
/**
* Constructs a new <tt>HashMap</tt> with the same mappings as the
* specified <tt>Map</tt>. The <tt>HashMap</tt> is created with
* default load factor (0.75) and an initial capacity sufficient to
* hold the mappings in the specified <tt>Map</tt>.
*
* @param m the map whose mappings are to be placed in this map
* @throws NullPointerException if the specified map is null
*/
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
总之记住几个重要的结论:
1.HashMap有无参构造,也有有参构造(初始容量和负载因子),默认初始容量16,负载因子0.75(double类型的)。
2.HashMap的最大容量为2的30次方。
3.使用中如果要存储很大的数据,最好一次性声明初始化容量,不要使用无参构造来让它自动resize,这样效率高一点。
HashMap是基于hash()来做存储和取值的
get和put都会调用hash(key)方法
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
4.HashMap是非线程安全的,如果是首次初始化HashMap,然后又有安全性需求,那就是用
Collections.synchronizedMap(new HashMap<>())
创建线程安全的map(将会向上转型成Map)
5.HashMap的子类LinkedHashMap是有序的,基于链表实现吗,顺序取决于插入顺序(链表的先进先出)。