HashMap源码解读

最新推荐文章于 2021-07-22 08:09:51 发布

z1340954953

最新推荐文章于 2021-07-22 08:09:51 发布

阅读量200

点赞数

分类专栏： Java集合框架文章标签： HashMap的数据结构和源码解读

Java集合框架专栏收录该内容

6 篇文章 0 订阅

订阅专栏

介绍了ArrayList和LinkedList，就两者而言，反应的是两种思想

1> ArrayList底层是以数组构成的，查询和在不扩容的情况下，顺序添加元素很快，插入和删除较慢

2> LinkedList底层是双向链表实现的，查询需要从header节点向前或向后遍历，插入和删除元素快

是否存在一个集合具备上面两个的优点，就是HashMap

HashMap是一种key-value形式存储的数据结构

HashMap的关注点

是否允许为空	key和value都允许为空
是否允许重复	key重复会覆盖，value允许重复
是否有序	无序，遍历得到的顺序基本上不可能是put的顺序
是否线程安全	非线程安全

HashMap的数据结构

java中，最基本的结构就是两种，一个是数组，另一个是模拟指针(引用),HashMap实际上一个链表散列的数据结构，就是数组和链表的结合体

从上图可以看出，HashMap底层就是一个数组结构，数组中的每一项又是一个链表，新建一个HashMap的时候，就会初始化一个数组.

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable
{
    /**
     * The load factor used when none specified in constructor.
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry[] table;

 static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;

从源码可以看出，hashmap的底层是一个Entry数组，每一个entry存储了key-vlaue 和下一个entry的引用,是个链表结构

HashMap的存储

public V put(K key, V value) {
		//如果key是null,将这个entry放在table[0]的位置
        if (key == null)
            return putForNullKey(value);
		//根据key的hashcode计算hash值
        int hash = hash(key.hashCode());
		//根据key的hash值计算出在数组中的索引位置
        int i = indexFor(hash, table.length);
		//遍历entry[]索引的链表，找到hash值和key值相同的节点，覆盖value，返回
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
		//否则，就记录下修改次数，在这个索引位置加上一个节点
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

整个过程：

根据key计算他的hash值，找到在数组中的位置(即下标)，如果该数组改位置上已经有元素，就会已链表的形式存储，新加入的放在前面，原来的放后面，如果改位置上没有元素，就会创建一个元素放在改位置。

下面详细的介绍下，每个方法是怎么执行的

第一步：如果key是null，从table[0]这个索引，找到链表进行遍历，如果找到节点的key为null,就将value替换原来的value,

如果table[0]处的节点为空，就创建节点放到哪里，后面检查size+1是否超过容量*加载因子的值，超过的话按2倍大小扩容。

private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);//每次调用put方法重写节点的value，都会调用
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

 void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

第二步：计算hash值，找到索引位置，遍历索引处的链表，找到相同的key，进行替换。

static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

static int indexFor(int h, int length) {
        return h & (length-1);
    }

在hashmap中要找到某个元素，需要根据key的hash值求得在数组中索引位置，使用hash算法求的这个位置的时候，检查这个位置上的链表是否存在相同的key，存在的话，替换value，否则，就是在这个索引位置新增元素。

在数组索引位置处新增节点

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {//扩容的代码，暂且不看
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

对于createEntry方法

如果在某个索引位置处没有元素，执行put("111","111")就创建一个节点e1，此时next = null,table[2]=e1

假设如果执行put节点e2，也放在这个索引位置，此时next=table[2],就是e1,然后有table[2] = e2

结论：在一个索引位置，每新增一个元素，就会将table[index]指定它，新增的这个元素的next执行原来table[index]的元素

，形成了单向链表

为什么HashMap的长度是2的n次方

hashmap默认长度是16，其他情况取的是大于设定容量的2的n次方的最小值

当长度是2的n次方时候，能够减少元素得到的数组下标在同一个位置的概率，减少碰撞。

h & (table.length-1)             hash                  table.length-1        
4 & (15-1)                       0100                    1110              = 0100
5 & (15-1)                       0101                    1110              = 0100

4 & (16-1)                       0100                    1111              = 0100
5 & (16-1)                       0101                    1111              = 0101

和15-1运算的后果是0001,0011,0111,0101,1001,1101这几个位置都无法存放元素，因为运算的结果根本就不会算到这个索引上，不仅浪费了空间，还增加了元素碰撞的效率，降低了查询效率

长度是2的n次方长度，（length-1）每个位都是1，因为hash值是均匀分布的，不同key算到的结果相同的概率很小，碰撞的机会少，效率更高

读取

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

有了前面的介绍，后面看起来就比较简单了，如果是null,从table[0]找到key为null的节点，否则返回null

key不为null,计算hash值找到在数组中的下表，从这个位置链表找到hash和key相同的节点，返回value，否则返回null

归纳起来说，HashMap在底层将key-value当做一个整体进行处理，这个整体是一个Entry对象。HashMap底层采用一个Entry[]数组保存所有的key-value对，当需要存储Entry对象时，根据hash算法决定其在数组中的存储位置，在根据equals方法找到链表上的存储位置；当需要取出一个entry时候，也是根据hash算法找到数组上的位置，equals方法找到节点，取出entry

HashMap的扩容

void addEntry(int hash, K key, V value, int bucketIndex) {
	Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length);
    }

threshold = int(capacity*loadFactor)

当数组中元素的个数大于等于threshold(容量*loadfactor)，就会按照原来容量的2倍进行扩容

hashMap的性能参数

hashMap包括下面几个构造器

new HashMap() 初始容量16，loadfactor(加载因子)为0.75

new HashMap(int capacity) 初始容量为大于capacity的2的n次方的最小值，loadfactor为0.75

new HashMap(int capacity,float loadFactor)

loadFactor:数组中entry元素的个数除以总容量,加载因子如果过大，对空间利用率高，但是查询效率低，反之，加载因子小，浪费空间.

Fast-fail 机制

 transient volatile int modCount;

 final Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();

            if ((next = e.next) == null) {
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
	    current = e;
            return e;
        }

在多线程环境下如果使用迭代器迭代，存在另一个线程修改了数据结构(put,remove)，就会导致当前线程迭代器出现modCount!=exceptedModCount，并抛出异常。迭代器就会快速失败。

HashMap和HashTable的区别

HashTable和HashMap是一组相似的键值对集合，主要的区别

1. HashTable是线程安全的，通过synchronized锁保证线程安全，HashMap则是线程不安全的

2. HashTable不允许key为null，HashMap允许key为null

public synchronized V put(K key, V value) {
	// Make sure the value is not null
	if (value == null) {
	    throw new NullPointerException();
	}

	// Makes sure the key is not already in the hashtable.
	Entry tab[] = table;
	int hash = key.hashCode();
	int index = (hash & 0x7FFFFFFF) % tab.length;
	for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {
	    if ((e.hash == hash) && e.key.equals(key)) {
		V old = e.value;
		e.value = value;
		return old;
	    }
	}

3. HashTable和HashMap的hash算法不同

参考博客：http://www.cnblogs.com/xrq730/p/5030920.html