JDK1.7 hashMap源码分析

最新推荐文章于 2024-05-07 11:35:56 发布

csdn产品小助手

最新推荐文章于 2024-05-07 11:35:56 发布

阅读量95

点赞数

文章标签： java 数据结构与算法

原文链接：http://www.cnblogs.com/yangyongjie/p/11015174.html

版权

了解HashMap原理之前先了解一下几种数据结构：

1、数组：采用一段连续的内存空间来存储数据。对于指定下标的查找，时间复杂度为O(1)，对于给定元素的查找，需要遍历整个数据，时间复杂度为O(n)。但对于有序

　　数组的查找，可用二分查找法，时间复杂度为O(logn)，对于一般的插入删除操作，涉及到数组元素的移动，其平均时间复杂度为O(n)。

2、哈希表：也叫散列表，用的是数组支持元素下标随机访问的特性，将键值映射为数组的下标进行元素的查找。所以哈希表就是数组的一种扩展，将键值映射为元素下标的函数叫做

　　哈希函数，哈希函数运算得到的结果叫做哈希值。哈希函数的设计至关重要，好的哈希函数会尽可能地保证计算简单和散列地址分布均匀。

　　哈希冲突（也叫哈希碰撞）：不同的键值通过哈希函数运算得到相同的哈希值，解决哈希冲突的方式有开放寻址法和链表法，ThreadLocalMap由于其元素个数较少，

　　采用的是开放寻址法，而HashMap采用的是链表法来解决哈希冲突，即所有散列值相同的元素都放在相同槽对应的链表中（也就是数组+链表的方式）

3、链表：链表使用内存中不连续的内存块进行数组的存储，其不支持随机访问，每次元素的查找都要遍历整个链表，时间复杂度为O(n)。

　　HashMap是由数组+链表构成的，即存放链表的数组，数组是HashMap的主体，链表则是为了解决哈希碰撞而存在的，如果定位到的数组不包含链表（当前的entry指向为null），那么对于查找，删除等操作，时间复杂度仅为O(1)，如果定位到的数组包含链表，对于添加操作，其时间复杂度为O(n)，首先需要遍历链表，存在相同的key则覆盖value，否则新增；对于查找操作，也是一样需要遍历整个链表，然后通过key对象的equals方法逐一比对，时间复杂度也为O(n)。所以，HashMap中链表出现的越少，长度越短，性能才越好，这也是HashMap设置阀值即扩容的原因。

HashMap的主干是一个Entry数组，Entry是HashMap的基本组成单元，每一个Entry包含一个key-value键值对。

    /**
     * An empty table instance to share when the table is not inflated.
     */
    static final Entry<?,?>[] EMPTY_TABLE = {};
    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

Entry是HashMap中的一个静态内部类

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

其他属性：

    /**
     * The default initial capacity - MUST be a power of two.处理容量，2的4次方，16，扩容后的容量必须是2的次方
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.　　最大容量，2的30次方
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.　　默认负载因子，0.75f
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;
        /**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.
    int threshold;　　扩容阀值

构造函数：

    public HashMap(int initialCapacity, float loadFactor) {
　　　　 // 校验初始容量值是否合法
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
　　　　
        this.loadFactor = loadFactor;
　　　　 // 目前扩容阀值等于初始容量，在真正构建数组的时候，其值为 容量*负载因子
        threshold = initialCapacity;
        init();
    }

可以看到，在进行put操作的时候才真正构建table数组

put方法：

    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
　　　　　// 根据key计算哈希值
        int hash = hash(key);
　　　　　// 根据哈希值和数据长度计算数据下标
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
　　　　　　　// 哈希值相同再比较key是否相同，相同的话值替换，否则将这个槽转成链表
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;　　// fast-fail，迭代时响应快速失败，还未添加元素就进行modCount++,将为后续留下很多隐患
        addEntry(hash, key, value, i);　　// 添加元素，注意最后一个参数i是table数组的下标
        return null;
    }

inflateTable：

    /**
     * Inflates the table.
     */
    private void inflateTable(int toSize) {
        // Find a power of 2 >= toSize，寻找大于等于toSize的最小的2的次幂，如果toSize=13，则capacity=16；toSize=16,capacity=16
　　　　 // toSize=28,capacity=32;也就是说，当你设置了HashMap的初始容量initCapacity时，并不是存储的数据达到设置的初始容量initCapacity*loadFactor时就扩容
　　　　 // 而是到了capacity = roundUpToPowerOf2(initCapacity)，capacity *loadFactor时才会扩容。

        int capacity = roundUpToPowerOf2(toSize); // 返回小于(toSize- 1) *2的最接近的2的次幂 ，如果toSize=1，则capacity=1，所以如果将initcapacity设为的话，第一次put不会扩容  

        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);
    }

hash方法：

    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
　　　　// 先取key的hashCode再和hashSeed进行异或运算
        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

indexFor：

    /**
     * Returns index for hash code h.　　返回数组下标
     */
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);　　保证获取的index一定在数组范围内
    }

所以最终存储位置的获取流程是这样的：

key--hashCode()-->hashCode--hash()-->h--indexFor()、h&(length-1)-->存储下标

addEntry：

　　 transient int size;　　// Entry数组实际大小
    void addEntry(int hash, K key, V value, int bucketIndex) {
　　　　　// 添加新元素前先判断数组的大小是否大于等于阀值，如果是且数组下标位置已经存在元素则对数组进行扩容，并对新的key重新根据新的数组长度计算下标
        if ((size >= threshold) && (null != table[bucketIndex])) {
　　　　　　  // 数组长度扩容为之前的2倍
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
　　　　
        createEntry(hash, key, value, bucketIndex);
    }
　　// 将新的key-value存入Entry数组并size自增1
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];　　//如果两个线程同时执行到此处，那么一个线程的赋值就会被另一个覆盖掉，这是对象丢失的原因之一
        table[bucketIndex] = new Entry<>(hash, key, value, e);　　
        size++;
    }

resize：

    void resize(int newCapacity) {
　　　　 // 保存就的数组
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
　　　　　// 判断数组的长度是不是已经达到了最大值
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
　　　　 // 创建一个新的数组
        Entry[] newTable = new Entry[newCapacity];
　　　　 // 将旧数组的内容转换到新的数组中
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
　　　　 // 计算新数组的扩容阀值
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

transfer：

哈希桶内的元素被逆序排列到新表中

    /**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
　　　　 // 遍历旧数组得到每一个key再根据新数组的长度重新计算下标存进去，如果是一个链表，则链表中的每个键值对也都要重新hash计算索引
        for (Entry<K,V> e : table) {
　　　　　　// 如果此slot上存在元素，则进行遍历，直到e==null，退出循环
            while(null != e) {
                Entry<K,V> next = e.next;
　　　　　　　　　// 当前元素总是直接放在数组下标的slot上，而不是放在链表的最后，所以
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
　　　　　　　　　// 把原来slot上的元素作为当前元素的下一个
                e.next = newTable[i];
　　　　　　　　 // 新迁移过来的节点直接放置在slot位置上
                newTable[i] = e;
                e = next;
            }
        }
    }

get方法：

    public V get(Object key) {
　　　　// 如果key为null，直接去table[0]处去检索即可
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key); // 根据key去获取Entry数组

        return null == entry ? null : entry.getValue();
    }
    final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
　　　　　// 根据key的hashCode重新计算hash值
        int hash = (key == null) ? 0 : hash(key);
　　　　 // 获取查找的key所在数组中的索引，然后遍历链表，通过equals方法对比key找到对应的记录
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

get方法相对比较简单，key(hashCode)-->hash-->indexFor-->index，找到对应位置table[index]，再查看是否有链表，通过key的equals方法对比找到对应的记录。

重写equal方法的同时必须重写hashCode()方法？如果不重写会有什么问题呢？

如：User类重写了equals方法却没有重写hashCode方法

public class User {
    private int age;
    private String name;

    public User(int age, String name) {
        this.age = age;
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        User user = (User) o;
        return age == user.age &&
                Objects.equals(name, user.name);
    }

}

将其作为key存入HashMap中，然后获取

        User user = new User(20, "yangyongjie");
        Map<User, String> map = new HashMap<>(1);
        map.put(user, "菜鸟");
        String value = map.get(new User(20, "yangyongjie"));
        System.out.println(value); //null

结果却为null，为什么呢？因为在默认情况下，hashCode方法是将对象的存储地址进行映射的，Object.hashCode()的实现是默认为每一个对象生成不同的int数值，它本身是native方法，一般与对象内存地址相关。而上面put和get的User虽然通过重写了equals方法使其逻辑上年龄和姓名相等的两个对象被判定为同一个对象，但是其两个对象的地址值并不相同，因此hashCode一定不同，那自然在put时的下标和get时的下标也不同。所以，如果重写了equals方法一定要同时重写hashCode方法。

此外，因为Set存储的是不重复的对象，依据hashCode和equals进行判断，所以Set存储的自定义对象也必须重写这两个方法。

补充一下：未重写前的equals方法和hashCode方法都可以用来比较两个对象的地址值是否相同，不同的是，两个地址值不同的对象的hashCode可能相同，但是equals一定不同。

HashMap存在的一些问题

死链：

　　两个线程A，B同时对HashMap进行resize()操作，在执行transfer方法的while循环时，若此时当前槽上的元素为a-->b-->null

　　1、线程A执行到 Entry<K,V> next = e.next;时发生阻塞，此时e=a，next=b

　　2、线程B完整的执行了整段代码，此时新表newTable元素为b-->a-->null

　　3、线程A继续执行后面的代码，执行完一个循环之后，newTable变为了a<-->b，造成while(e!=null) 一直死循环，CPU飙升

扩容数据丢失：

　　同样在resize的transfer方法上

　　1、当前线程迁移过程中，其他线程新增的元素有可能落在已经遍历过的哈希槽上；在遍历完成之后，table数组引用指向了newTable，

　　　　这时新增的元素就会丢失，被无情的垃圾回收。

　　2、如果多个线程同时执行resize，每个线程又都会new Entry[newCapacity]，此时这是线程内的局部变量，线程之前是不可见的。迁移完成

　　　　后，resize的线程会给table线程共享变量，从而覆盖其他线程的操作，因此在被覆盖的new table上插入的数据会被丢弃掉。

转载于:https://www.cnblogs.com/yangyongjie/p/11015174.html

csdn产品小助手

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JDK1.7 hashMap源码分析

了解HashMap原理之前先了解一下几种数据结构：1、数组：采用一段连续的内存空间来存储数据。对于指定下标的查找，时间复杂度为O(1)，对于给定元素的查找，需要遍历整个数据，时间复杂度为O(n)。但对于有序　　数组的查找，可用二分查找法，时间复杂度为O(logn)，对于一般的插入删除操作，涉及到数组元素的移动，其平均时间复杂度为O(n)。2、哈希表：也叫散列表，用的是数组支持元素下标随...
复制链接

扫一扫