【jdk源码解析二】java.uti.HashMap源码解析

最新推荐文章于 2024-09-04 15:29:36 发布

wangqisen

最新推荐文章于 2024-09-04 15:29:36 发布

阅读量89

点赞数

文章标签： java 数据结构与算法

HashMap是我们常用的容器类之一，阅读了HashMap类的源码之后，来分析一下其源码。

1.HashMap的类组织结构

HashMap继承自AbstractMap抽象类，AbstractMap抽象类则是对Map接口的简单实现，AbstractMap实际上已经实现了大部分Map接口的方法，并提供了两个Entry内部类SimpleEntry以及SimpleImmutableEntry，后者相对前者来说，值不能改变而已。

而HaspMap则是对AbstractMap的基于Hash table数据结构的实现。实际上，如果想要实现一个自己的Map型数据结构，最简单的只需要继承AbstractMap实现entrySet()方法就可以。

2.HashMap的数据结构：

why 哈希表？

实际上，数据结构最常用的基本两种是数组与链表，数组便于查找，而不便于增删元素，而链表则便于增删，不便于查找，因此，哈希表可以说是两者的折中，实现了既能快速查找又能快速增删元素的功能。

在HashMap中，使用拉链法也就是十字链表来实现哈希表。

3.HashMap详解

首先，HashMap允许存在空的键和空的值，而Hashtable就不可以。同时，HashMap并不是线程安全的，因此，在多线程操作HashMap对象时，必需使用线程安全的方法来操作HashMap，或者使用Collection.synchronizedMap(Map m)方法来保证线程安全的操作。同时，HashMap并不保证每次遍历所得到的键值对的顺序都是一致的。这个原因后面说。

HashMap的get()和put()方法所需的时间都是恒定的，不受HashMap其他因素的影响。而对HashMap的遍历与其桶的数量以及键值对的数量有关，桶的属性加上键值对数量之和越大，遍历所需要的时间就越长。因此，在初始时，不能设置桶的大小太大。

影响HashMap性能的两个元素是：初始桶的大小以及装载因子。装载因子太大，虽然所占的空间开销小了，但是，却会影响查询的时间开销。太小，就会造成空间的浪费。一般情况下，0.75的装载因子是最理想的。初始默认情况下，桶的大小是16，装载因子是0.75。

其次，HashMap中存在两个变量，useAltHashing以及hashSeed。前者的作用是，在对字符串这种容易造成hash冲突使用hash函数是，有可能使用其他hash方法而不是HashMap自身对键值默认的hash方法，useAltHashing就是设置是否需要使用其他方法的boolean类型的变量。而hashSeed的作用，引用stackoverflow上的答案如下：

The seed parameter is a means for you to randomize the hash function. You should provide the same seed value for all calls to the hashing function in the same application of the hashing function. However, each invocation of your application (assuming it is creating a new hash table) can use a different seed, e.g., a random value.

Why is it provided?

One reason is that attackers may use the properties of a hash function to construct a denial of service attack. They could do this by providing strings to your hash function that all hash to the same value destroying the performance of your hash table. But if you use a different seed for each run of your program, the set of strings the attackers must use changes.

HashMap的桶容量都是2的幂，其最大的容量是2^30。也就是最多一个亿。

HashMap的构造函数：

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        init();
    }

可以看到，大师们写的代码逻辑严密，效率高，而且最重要的是，容易让人看懂。

其中，如果传入的初始容量不是2的幂次，将会找到最接近的幂次作为初始桶的容量，threshold是用来标识是否需要进行扩容的标识。在键值对的数量达到threshold后，会将容量增大一倍。

  final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }

        h ^= k.hashCode();

        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

以上是hash函数的代码，这里的hash并不是指map本身的hash，而是对键值对的键的hash。正是依靠这个hash值，才能作为索引来查找键值对。这个hash函数的精妙之处在于，使得键的哈希码的高位也能参与运算，这样，就可以大大降低在键值对数组的长度较小时产生冲撞的可能。

 static int indexFor(int h, int length) {
        return h & (length-1);
    }

上面这段代码用于找到键的哈希码对应的数组下标。这里就可以看出把哈希表的桶容量设置成2的幂值的重要性了。在有键的哈希码之后，将其做模运算映射到数组坐标恐怕是最正常的思维，但是，天才的大师们使用了简单地与运算来替代模运算，实现最快的索引过程。如果这里的桶容量不是2的幂次，就会产生某些数组坐标上的键值对的冲撞现象很严重，因此，必须是2的幂次。

public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

    /**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

上面这段代码用来查找键对应的值，需要注意的是，键为空的键值对放在数组的下表为0的位置上。

 final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

这是从特定的键值得到键值对的方法。先由键的哈希码得到键值对所在的数组下标，随后，遍历该下标的链表，找到键的哈希码与该键相同且键相等的键值对返回。

public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

这里，注意，modCount的作用是在迭代时快速出错机制。其记录map修改的次数，以在迭代时如果发现modCount改变，就说名迭代时map改变了，抛出异常。

void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

在复制旧表到新表的过程中，需要重新放置键值对的位置，实际上在插入到新的链表中去时，是采用的后插得在头的形式。注意这几句代码：

 int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;

   final Entry<K,V> removeEntryForKey(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }

这个就是从中移除特定键的代码了，注意，由于是对结构的改变，因此，modCount也要改变。

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }

        public final boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry e = (Map.Entry)o;
            Object k1 = getKey();
            Object k2 = e.getKey();
            if (k1 == k2 || (k1 != null && k1.equals(k2))) {
                Object v1 = getValue();
                Object v2 = e.getValue();
                if (v1 == v2 || (v1 != null && v1.equals(v2)))
                    return true;
            }
            return false;
        }

        public final int hashCode() {
            return (key==null   ? 0 : key.hashCode()) ^
                   (value==null ? 0 : value.hashCode());
        }

        public final String toString() {
            return getKey() + "=" + getValue();
        }

        /**
         * This method is invoked whenever the value in an entry is
         * overwritten by an invocation of put(k,v) for a key k that's already
         * in the HashMap.
         */
        void recordAccess(HashMap<K,V> m) {
        }

        /**
         * This method is invoked whenever the entry is
         * removed from the table.
         */
        void recordRemoval(HashMap<K,V> m) {
        }
    }

这是HashMap实现的Entry内部类，键值对的hashcode是键的hashcode和值得hashcode的亦或所得的结果。

private abstract class HashIterator<E> implements Iterator<E> {
        Entry<K,V> next;        // next entry to return
        int expectedModCount;   // For fast-fail
        int index;              // current slot
        Entry<K,V> current;     // current entry

        HashIterator() {
            expectedModCount = modCount;
            if (size > 0) { // advance to first entry
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
        }

        public final boolean hasNext() {
            return next != null;
        }

        final Entry<K,V> nextEntry() {
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Entry<K,V> e = next;
            if (e == null)
                throw new NoSuchElementException();

            if ((next = e.next) == null) {
                Entry[] t = table;
                while (index < t.length && (next = t[index++]) == null)
                    ;
            }
            current = e;
            return e;
        }

        public void remove() {
            if (current == null)
                throw new IllegalStateException();
            if (modCount != expectedModCount)
                throw new ConcurrentModificationException();
            Object k = current.key;
            current = null;
            HashMap.this.removeEntryForKey(k);
            expectedModCount = modCount;
        }
    }

这里的map实现的迭代器，不得不说，大师的代码简洁易懂，非常流弊。值得细细揣摩。

另外，关于快速出粗机制，只能用于对HashMap迭代器遍历时map结构改变的一种出错保障机制，完全不能对map这种并发操作的安全性给出保障，因此，只能用于debug。

wangqisen

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【jdk源码解析二】java.uti.HashMap源码解析

HashMap是我们常用的容器类之一，阅读了HashMap类的源码之后，来分析一下其源码。1.HashMap的类组织结构HashMap继承自AbstractMap抽象类，AbstractMap抽象类则是对Map接口的简单实现，AbstractMap实际上已经实现了大部分Map接口的方法，并提供了两个Entry内部类SimpleEntry以及SimpleImmutableEntry，后者相对...
复制链接

扫一扫