HashMap源码详解

最新推荐文章于 2022-04-23 08:45:00 发布

chengbinbbs

最新推荐文章于 2022-04-23 08:45:00 发布

阅读量154

点赞数

分类专栏： Java 文章标签： HashMap java 线程安全源码解析单向链表

本文链接：https://blog.csdn.net/chengbinbbs/article/details/79231883

版权

Java 专栏收录该内容

36 篇文章 1 订阅

订阅专栏

HashMap是一种常用的K-V键值对的存储结构，它的特点是：
1.key和value都允许为空
2.允许重复数据，key相同的话，value会覆盖
3.集合中元素是没有顺序的
4.线程不安全的

结合这些特点来看看它的源码实现：

数据结构

HashMap的基本数据结构是一个内部类Entry，由key、value、next、hash等变量组成，next指向下一节点，
可以看出Entry是一个单向链表，只有后继节点，没有前驱节点。

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
    ...
}

我们先初始化一个HashMap：

public static void main(String[] args) {
    Map<String,String> map = new HashMap<>();
    map.put("a","1");
    map.put("b","2");
    map.put("c","3");
    map.put("d","4");
    ...

}

调用默认无参构造器，先看看构造器的实现：

//默认初始化容量16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * 最大容量
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * 默认增长因子
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * Entry数组
     */
    static final Entry<?,?>[] EMPTY_TABLE = {};

    /**
     * map中键值对的数量
     */
    transient int size;

    /**
     * map阀值= (capacity * load factor).
     * @serial
     */
    int threshold;

public HashMap() {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

可以看到构造器会预初始化一个默认的容量，默认容量16，默认增长因子0.75f，当然这个时候HashMap还是空的，
Entry数组也是空的，这个是jdk1.7跟1.6的不同，1.6默认构造器就初始化了Entry数组，而1.7以后采用了延迟初始化，
在调用put的时候才初始化了Entry数组，下面一起来看看put方法。

put方法

put是添加元素的方法：

public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

首先会判断Entry数组是否为空，刚开始当然是空的，那么就调用inflateTable(threshold)进行扩容填充，看看该方法的实现：

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

可以看到Entry数组被初始化为容量capacity=16了，下次增长threshold=12。再往下看，当key为空的时候回调用putForNullKey方法，这也是HashMap允许空的原因，key为空的时候，默认会放到Entry数组为0的位置上。
key不为空的时候，先对key进行hash算法，计算出hash值，然后调用indexFor(hash, table.length)方法：

static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}

根据hash值对Entry数组的大小取模得到一个Entry数组的位置，从这里可以看出map的存储是随机的，因此HashMap存储是没有顺序的。
然后判断Entry数组中该位置是否有值，如果有的话，并且key相同的就覆盖并返回。否则就调用addEntry(hash, key, value, i)方法进行添加：

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

第一次添加的时候，size=0，方法会调用createEntry方法，在Entry数组的对应位置上创建一个entry节点，如果该位置已经有元素了，就说明hash冲突，这样会在index位置生成链表：

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

新建Entry节点的next节点指向旧址，原来的旧址位置是空的，这里第一次添加之后Entry数组里面有一个节点，它的下一个节点指向null，并且HashMap的size加1。接着第二次map.put(“b”,”2”)继续分析：
还是上面的步骤，根据key算出hash值，根据hash值和Entry数组长度取模获得value存储位置，然后调用addEntry方法添加节点，如果取得数组的位置相同，并且map容量超过步长，就开始扩容：

if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

创建一个新数组，新数组容量是旧数组的两倍，并且重新计算hash值。

删除remove

直接看源码：

public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.value);
    }

final Entry<K,V> removeEntryForKey(Object key) {
        if (size == 0) {
            return null;
        }
        int hash = (key == null) ? 0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }

了解了添加的方法之后，删除方法应该就很好理解了，原理基本一样，根据key计算出hash值，根据hash值和数组长度计算出存储位置，判断该位置值是否为空，不为空的话，取出该位置的链表，遍历链表，直到取到key相同的节点，把该节点去除。

get方法

get方法也很简单：

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }

        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

根据key计算出hash值，根据hash值和数组长度计算出存储位置，遍历该位置的链表，直到找到key相同的节点，返回该节点，并且取出value。

HashMap的table为什么是transient的

一个非常细节的地方：
transient Entry[] table;
因为HashMap是基于HashCode的，HashCode作为Object的方法，是native的：

public native int hashCode();

这意味着的是：HashCode和底层实现相关，不同的虚拟机可能有不同的HashCode算法。再进一步说得明白些就是，可能同一个Key在虚拟机A上的HashCode=1，在虚拟机B上的HashCode=2，在虚拟机C上的HashCode=3。

这就有问题了，Java自诞生以来，就以跨平台性作为最大卖点，好了，如果table不被transient修饰，在虚拟机A上可以用的程序到虚拟机B上可以用的程序就不能用了，失去了跨平台性，因为：

1、Key在虚拟机A上的HashCode=100，连在table[4]上

2、Key在虚拟机B上的HashCode=101，这样，就去table[5]上找Key，明显找不到

整个代码就出问题了。因此，为了避免这一点，Java采取了重写自己序列化table的方法，在writeObject选择将key和value追加到序列化的文件最后面：

private void writeObject(java.io.ObjectOutputStream s)
        throws IOException
{
Iterator<Map.Entry<K,V>> i =
    (size > 0) ? entrySet0().iterator() : null;

// Write out the threshold, loadfactor, and any hidden stuff
s.defaultWriteObject();

// Write out number of buckets
s.writeInt(table.length);

// Write out size (number of Mappings)
s.writeInt(size);

// Write out keys and values (alternating)
if (i != null) {
 while (i.hasNext()) {
    Map.Entry<K,V> e = i.next();
    s.writeObject(e.getKey());
    s.writeObject(e.getValue());
    }
  }
}

而在readObject的时候重构HashMap数据结构：

private void readObject(java.io.ObjectInputStream s)
         throws IOException, ClassNotFoundException
{
// Read in the threshold, loadfactor, and any hidden stuff
s.defaultReadObject();

// Read in number of buckets and allocate the bucket array;
int numBuckets = s.readInt();
table = new Entry[numBuckets];

    init();  // Give subclass a chance to do its thing.

// Read in size (number of Mappings)
int size = s.readInt();

// Read the keys and values, and put the mappings in the HashMap
for (int i=0; i<size; i++) {
    K key = (K) s.readObject();
    V value = (V) s.readObject();
    putForCreate(key, value);
}
}

一种麻烦的方式，但却保证了跨平台性。

HashMap和Hashtable的区别
1.HashTable中key、value都不允许为空，为空的话就会报空指针异常，而HashMap则没有限制。
2.HashMap是线程不安全的，而HashTable是线程安全的，HashTable的方法都使用了synchronized锁。
3.HashMap和Hashtable的reHash算法是不同的，迭代器也是不通的，Hashtable是通过Enumerator来进行
迭代的，而HashMap则是使用Iterator迭代器。

参考：http://www.cnblogs.com/xrq730/p/5030920.html