原来你是这样的HashMap


title: 原来你是这样的HashMap tags:

  • Java
  • JCF
  • HashMap
  • rehash categories: jcf date: 2017-09-18 19:39:51

Java中HashMap想必是最常用的集合类之一

如下分析均基于Jdk1.7

其中Map是属于JCF中顶级接口 另一个是Collection

Map接口类型如下

有如下几个特点

  1. size返回int表示最大容量不可能超过Integer.MAX_VALUE 否则无法表示 事实上 hashmap的最大容量为
    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

复制代码

也就是其实HashMap是存在最大容量的 那么思考为啥最大容量不是1<< 31呢?

  1. containsKey以及containsValue,get,remove均使用Object作为参数而不是泛型
     what-are-the-reasons-why-map-getobject-key-is-not-fully-generic

我们在使用HashMap的时候一般会调用如下接口

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }
复制代码

事实上我们调用的是经验值(通常初始化容量为16(2^4) 负载因子为0.75)

引入了两个新的变量:

  1.  CAPACITY 容量 表示内部数组的大小
    复制代码
  2.  LOAD_FACTOR 负载因子 表示在给定容量下分配数组的分配概率,通常该参数影响较大
    复制代码

    比如说负载因子为10 那么可以认为碰撞概率为10 也就是平均每个hash碰撞率在10 因此经验值选择0.75 较为合理

    和这两个参数有关的是threshold参数

        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
复制代码
 
复制代码

该参数表示阈值 意义表示当size>=threshold 需要resize整个HashMap

  1. 初始化

分析如下代码

    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
     
        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;
     
        this.loadFactor = loadFactor;
        threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        init();
    }
复制代码

注意细节 capacity是1左移的结果,也就是初始化容量必然是2的pow(距离传入数字最靠近的不小于指定数字的1的左移) 而不是自己传入的数字。(一般来说素数的冲突较小,为何选择pow(2,n),下文描述)table为存储Entry的数组,也就是我们使用的实体(K-V映射)

    static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
     
        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
    }
复制代码

其中key为泛型key value为泛型value 并且记录了链表指针next 可窥全图

当然jdk8总当链表长度超过一定长度将自动转化成红黑树

2.当开发者调用put时操作如下

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
     
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
复制代码

首先check key是否为空,否则特殊处理为0

    /**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }
复制代码

即null key必然放置在table[0],因此需要循环查找该链表 如果该链表中包含key为null则直接替换否则插入对应null key

3.当key不是null时 首先计算key的对应hash

    /**
     * Retrieve object hash code and applies a supplemental hash function to the
     * result hash, which defends against poor quality hash functions.  This is
     * critical because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     */
    final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }
     
        h ^= k.hashCode();
     
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
复制代码

hash函数尽量得出均匀的hash值。因此使用了多次循环右移(Java8进行了改造)

  1. 根据hash找到指定的在table的位置
    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }
复制代码

这边解释了为何使用pow(2,n)作为table的length。如果常规做法通常就是mod。但是基于框架级别的选择除法的效率和与操作的效率相比较差。pow(2,n)-1 可以得出比如0111, 01111,011111等等
此时做与操作可以将hash值的末尾n位的值拿出来。因此对于hash的要求必须生成的hash在末端不要重复。相当于会抹去32-n的前位。 而如果不是2的倍数的情况下可能无法获得更多的信息来作为hash分配

  1. 当在对应的hash路径下如果可以找到指定的Key那么直接覆盖替换(由此要求hashcode和equals两个方法在覆盖重写必须一起重写,否则很容易出现纰漏)
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
复制代码

6. 如果对应的key不存在的情况下

    /**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }
     
        createEntry(hash, key, value, bucketIndex);
    }
复制代码

检测当前size是否比阈值大,如果是则需要扩容。每次扩容均是前面的容量的2倍,此时需要rehash操作 每次rehash其实由于长度变为2倍所以对于只有低位的hashcode可能并不会出现rehash操作(jdk8中做了优化)

    /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
     
        Entry[] newTable = new Entry[newCapacity];
        boolean oldAltHashing = useAltHashing;
        useAltHashing |= sun.misc.VM.isBooted() &&
                (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean rehash = oldAltHashing ^ useAltHashing;
        transfer(newTable, rehash);
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
    /**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }
复制代码

7. 容量为最大容量时,此时不再扩充。同时将阈值设置为最大值Integer.MAX_VALUE
当容量未达到最大容量时,此时需要将老的数据全部放到新的数组中(相当耗时)因此一个合理的负载因子和初始化容量很有必要(试想当一个大的hashmap 重头开始扩容需要多少次,比如size为100000  10000<2*2*2*2*2*2*2*2*2*2*2*2*2*2*0.75 )

当然由于链表重新transfer,其顺序也发生了倒置

8. 根据计算的hash以及算出的对应的index直接 将原先数组对应的对象作为next指针即可

    /**
     * Like addEntry except that this version is used when creating entries
     * as part of Map construction or "pseudo-construction" (cloning,
     * deserialization).  This version needn't worry about resizing the table.
     *
     * Subclass overrides this to alter the behavior of HashMap(Map),
     * clone, and readObject.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }
复制代码

由于通篇均没有使用锁,因此HashMap不是线程安全的,如果作为共享对象很容易出现各种各样的问题。

小测试如下

    @Test
    public void testA() {
        A a = new A(10);
        Map<A, Object> map = new HashMap<>();
        map.put(a, a.getA());
        a.setA(100);
        System.out.println(map.get(a));
        map.put(a, a.getA());
        System.out.println(map.size());
    }
     
     
    class A {
        public A(int a) {
            this.a = a;
        }
     
        private int a;
     
        public int getA() {
            return a;
        }
     
        public void setA(int a) {
            this.a = a;
        }
     
        @Override
        public int hashCode() {
            return a;
        }
     
        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (!(o instanceof A)) return false;
            A a1 = (A) o;
            return a == a1.a;
        }
    }
复制代码

结果是啥?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值