java hashmap 原理

最新推荐文章于 2023-02-13 00:23:26 发布

tuobana123

最新推荐文章于 2023-02-13 00:23:26 发布

阅读量570

点赞数

分类专栏： Java 集合

本文链接：https://blog.csdn.net/pingnanlee/article/details/40507191

版权

Java 集合专栏收录该内容

18 篇文章 0 订阅

订阅专栏

hashmap的原理主要从如下几方面进行分析

1、继承关系

public class HashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable

AbstractMap是一个抽象类它继承了Map接口，并且实现了Map接口中的很多方法，比如get，ContainsKey，ContainsValue，remove等，但它都是基于迭代器实现的。Map接口定义了Map体系的相关方法比如get，put，remove等；Cloneable接口定义了clone方法，使得HashMap支持克隆；Serializable接口使得HashMap支持网络对象传输。

2、关键属性，方法和内部类

1）序列化相关的序列号

private static final long serialVersionUID = 362498820763181265L;

2）属性

DEFAULT_INITIAL_CAPACITY是hashmap中table的默认大小，

DEFAULT_LOAD_FACTOR是hashmap的默认负载因子关系着threshold的大小，

transient关键字和Serializable有关系，使用transient标示的属性在序列化时不参与，

table是HashMap的主存储结构，一个Entry<K,V>数组，其中Entry是一个键值对结构，包括一个next引用，指向下一个Entry对象，（ps：参考Entry的定义应该就可以实现java数组，^_^）

size是HashMap的当前元素量，

threshold是阀值，若size大于threshold时HashMap将扩容，

loadFactor是负载因子，用于计算threshold，threshod=table.length*loadFactor，

modCount主要用于遍历时检测数据结构是否发生变化。

static final int DEFAULT_INITIAL_CAPACITY = 16;
static final int MAXIMUM_CAPACITY = 1 << 30;
static final float DEFAULT_LOAD_FACTOR = 0.75f;
transient Entry<K,V>[] table;
transient int size;
int threshold;
final float loadFactor;
transient int modCount;

3）内部类，一个是Entry，一个是HashIterator，Entry是HashMap的基本存储结构，HashIterator是EntryIterator，KeyIterator，ValueIterator的父类，定义了迭代需要的基本方法，以及一个remove方法，remove方法需要重点注意，在遍历的过程中对HashMap元素的删除，必须通过remove方法进行，否则会有并发访问异常。

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return (key==null   ? 0 : key.hashCode()) ^
                   (value==null ? 0 : value.hashCode());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

这里需要注意reomove方法，expectedModCount = modCount;，这个就保证了在遍历的时候remove方法不会引起并发访问异常。

并发访问异常在什么发生，就是在执行nextEntry方法时判断了，expectedModCount 和 modCount的值，当在遍历过程中调用了remove方法和put方法，则会引起modCount的值发生变化，从而expectedModCount的值和modCount值不相等，抛出并发访问异常。

private abstract class HashIterator<E> implements Iterator<E> {
        Entry<K,V> next;        // next entry to return
        int expectedModCount;   // For fast-fail
        int index;              // current slot
        Entry<K,V> current;     // current entry

        HashIterator() {
            expectedModCount = modCount;
            if (size > 0) { // advance to first entry
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
        }

        public final boolean hasNext() {
            return next != null;
        }

        final Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();

            if ((next = e.next) == null) {
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
            current = e;
            return e;
        }

        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;
        }
    }

4）方法

a、构造函数，下面的构造函数是最基本的构造函数，其他的构造函数都会调用这个最终实现，一般的我们默认构造的HashMap数组大小是16。它的主要作用就是初始化table，loadFactor，threshold的值，这里我们需要注意的一点是HashMap的数组大小并非是我们设置的initialCapacity值，而是并initialCapacity值大的2的密的值，为什么数组的大小必须是2的密，这里说明一下是为了保证基于hash值找数组索引时最大程度的避免冲突，使得元素分布均匀，具体可以从indexFor函数的算法看出，h & (length -1)，当length是2的密时，length-1这个数字有一个特征即转换为二进制时1的个数是最多的，这样当不同的h&(length-1)时则会尽可能的得到不同的值，保证分布均匀。

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        init();
    }

b、put函数，该函数完成存储和修改功能。put函数我们需要下面几个方面的东西：

hashMap支持key 为null值，value也可以是null值；

put方法支持存储和修改，即相同的key就是修改，不同的key就是存储

indexFor是基于Hash值算数组的索引，之前分析过，这里需要注意的是length是2的密

modCount++，这个值之前说过是用于遍历时保证Iterator的有效性的，当put新的元素时modCount+1

public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

addEntry函数，重点需要注意的是：

若size大于阀值，并且当前table下标非空时，进行扩容以及基于当前key重新计算hash值，resize是一个复杂的过程，需要对旧的HashMap进行拷贝，生成一个新的hashMap，所以在使用的时候若能确定应用时HashMap的大小，最好就是直接初始化一个大的容量，避免HashMap的resize过程。

void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

createEntry函数，创建一个Entry

void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

rezie的过程就是transfer的过程，具体实现如下，

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        boolean oldAltHashing = useAltHashing;
        useAltHashing |= sun.misc.VM.isBooted() &&
                (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean rehash = oldAltHashing ^ useAltHashing;
        transfer(newTable, rehash);
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

transfer函数，就是从旧的HashMap重新生成新的HashMap的过程，该函数的实现值得参考

    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

c、removeEntryForKey函数，该函数支持remove功能，remove的过程就是先找table的索引，然后遍历数组元素链表，没有什么特别的地方

    final Entry<K,V> removeEntryForKey(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }

d、clone方法，clone方法返回一个拷贝的HashMap，值的注意的是，这个拷贝的hashMap的key和value是没有拷贝的，拷贝的HashMap只是生成了新的空间，并且将原Map中的内容进行拷贝，也就是说当Key改变时，将会影响到拷贝的HashMap的key。

/**
     * Returns a shallow copy of this <tt>HashMap</tt> instance: the keys and
     * values themselves are not cloned.
     *
     * @return a shallow copy of this map
     */
    public Object clone() {
        HashMap<K,V> result = null;
        try {
            result = (HashMap<K,V>)super.clone();
        } catch (CloneNotSupportedException e) {
            // assert false;
        }
        result.table = new Entry[table.length];
        result.entrySet = null;
        result.modCount = 0;
        result.size = 0;
        result.init();
        result.putAllForCreate(this);

        return result;
    }

3、线程安全分析

我们都知道hashmap是非线程安全的，那么导致它不安全的地方主要是在哪里呢，其实不安全的地方主要由put方法，remove方法导致的，通过put和remove的源码分析，我们可以看到，这两个方法都要先获取index对应的Entry链表的头结点，然后基于头结点进行计算，这样当两个线程是对同一个index的Entry链表进行操作的话，这样就会出现覆盖的情况，后完成的线程将会覆盖先完成的线程的执行结果，这个就是不安全的原因。

另外，多个线程往map中put数据的时候，也有可能导致链表出现死循环的情况，这种情况主要是由于resize的过程中，链表重新生成的时候，多个线程同时resize，导致链表的next指向异常从而出现了死循环。

最后总结说明一下，hashMap实际上就是一个数组+链表的结构，key得到hash值，基于hash值算数组的索引，hash值相同的key具有相同的索引，这时就是hash冲突，解决冲突通过链表的方法解决；另外对应key为null的情况，null没有hash值，它直接用的就是0号数组索引。