HashMap是如何组织数据的

最新推荐文章于 2022-08-17 10:34:10 发布

windcake

最新推荐文章于 2022-08-17 10:34:10 发布

阅读量405

点赞数

分类专栏： JAVA基础回顾文章标签： JAVA集合 hashmap Android

本文链接：https://blog.csdn.net/windcake/article/details/53836359

版权

JAVA基础回顾专栏收录该内容

8 篇文章 0 订阅

订阅专栏

HashMap简介

HashMap存储数据采用了key-value形式，采用数组存储key的HashCode，来实现对数据的存储，这样每提供一个key就能知道它在数组中的位置，而不用像ArrayList一样去遍历。但是不同的key算出的HashCode在一个很小的概率上可能相同，即发生了Hash冲突。HashMap采用了链表法来解决这个冲突。所以HashMap的底层数据结构是数组加链表，链表是单链表。另外HashMap的key可以为null

数据结构

transient HashMapEntry<K, V> entryForNullKey;
transient HashMapEntry<K, V>[] table;
static class HashMapEntry<K, V> implements Entry<K, V> {
    final K key;
    V value;
    final int hash;
    HashMapEntry<K, V> next;

    HashMapEntry(K key, V value, int hash, HashMapEntry<K, V> next) {
        this.key = key;
        this.value = value;
        this.hash = hash;
        this.next = next;
    }
}

table是HashMapEntry类型的数组，而HashMapEntry则是组成单链表的元素。
Entry中key和value自不必说，hash用来存储根据key算出来的hash值。next用来指向链表中的下一个元素。
entryForNullKey用来存储key为null的数据，这个Entry不会被放到table里，但是它里边如果有数据是被算在HashMap里的。
一个典型的HashMap如下
这里写图片描述

数组中的空闲位置是刻意为之，控制这个空闲位置的参数叫Load Factor，翻译成中文有点土叫“加载因子”，加载因子乘以当前总容量叫threshold，也就是阈值。当HashMap中当前元素个数size大于threshold的时候，就会扩容。

当table中的容量越满，那么再加入一个元素时，发生冲突的概率就越大。我们不希望发生冲突。理论上讲，在table装入很少的元素时就及时去扩容当然是会减少冲突，但是扩容很费时间，这样做也会导致table中有大量的空闲位置，也就是浪费了内存空间。所以必须在时间和空间上做一个平衡，Google给出的Load Factor值是0.75，并直接写死不能改变。

还有一个现象，0x08处有两个元素，它是一个只有两个元素短短的单链表。这就是用链表法来解决Hash冲突，其实我更倾向于叫它索引冲突，至于为什么，下文有讲解。

初始化

public HashMap(int capacity) {
       if (capacity < 0) {
           throw new IllegalArgumentException("Capacity: " + capacity);
       }

       if (capacity == 0) {
           @SuppressWarnings("unchecked")
           HashMapEntry<K, V>[] tab = (HashMapEntry<K, V>[]) EMPTY_TABLE;
           table = tab;
           threshold = -1; // Forces first put() to replace EMPTY_TABLE
           return;
       }

       if (capacity < MINIMUM_CAPACITY) {
           capacity = MINIMUM_CAPACITY;
       } else if (capacity > MAXIMUM_CAPACITY) {
           capacity = MAXIMUM_CAPACITY;
       } else {
           capacity = Collections.roundUpToPowerOfTwo(capacity);
       }
       makeTable(capacity);
   }

在指定容量的构造函数中，除了保证容量的范围，还规范了capacity容量必须是是2^n，比如输入的是6，那么会被规范成8,如果是10，那么就被规范成16.这是为什么以及有什么好处当然是有原因的。

put方法

@Override public V put(K key, V value) {
       if (key == null) {
           return putValueForNullKey(value);
       }
      //  根据key算出hash值
       int hash = Collections.secondaryHash(key);
       HashMapEntry<K, V>[] tab = table;
       int index = hash & (tab.length - 1);
       for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
           if (e.hash == hash && key.equals(e.key)) {
            //  这个函数是留给LinkedHashMap用的
               preModify(e);
               V oldValue = e.value;
               e.value = value;
               return oldValue;
           }
       }
       // No entry for (non-null) key is present; create one
       modCount++;
       if (size++ > threshold) {
           tab = doubleCapacity();
           index = hash & (tab.length - 1);
       }
       addNewEntry(key, value, hash, index);
       return null;
   }

如果把map理解为“映射”，我觉得它涵盖了两个方面，一方面是key-value的映射，另一方面是key-hashcoe的映射。把key映射成int型的hash码，之后并不能直接放到table里，table的索引值范围很小，而hashCode的范围很大，他们互相匹配的概率是在是太小了。用整数a对整数b做取余（%）运算，结果区间是[0 - a-1),所以如果对hashCode用table的长度取余得到的值一定在table的索引区间内（hashCode%table.length 属于 [0 - table.length-1)），这样也就确定了Entry在table中的位置。

然而我们并没有看到取余运算，出现在眼前的是&运算，好我们探究一下原因。

2^n转换成二进制的值是1 + n个0，减去1之后，就变成了 0 + n个1,这样通过与运算就取到了低n位。二进制中n位的取值范围是[0- 2^n)或者[0 - 2^n],也就相当于对2^n取余，这一下我们就明白了两个问题为什么table的容量必须是2的n次方，以及取余是如何变成 & 运算的。只有二次方才有这个特性，这样做是因为&的运算速度比%快很多，

然后遍历index位置的链表，虽然这个链表存在的概率不大，但这个动作必须有。如果key相同，就替换它的value如果table中index索引位置没有数据，那么跳出for循环。

虽然说理论上来讲，这种数组+链表的方式是装不满的，发生冲突之后就插入到链表里好了。这样确实不影响插入速度，但是get的时候遍历一个巨长的数组就很费时间了。
如果当前size大于阈值，那么进行扩容。

否则addNewEntry

void addNewEntry(K key, V value, int hash, int index) {
        table[index] = new HashMapEntry<K, V>(key, value, hash, table[index]);
    }

addNewEntry的效果是新table[index]的next属性指向原来table[index]，如果不发生冲突，那么这个过程看起来很像是给数组的某个位置赋值。如果这个被put的元素的key算出的 HashCode正好是06,发生了冲突。那么假设原来的情况是图1，此时要把C放进去，在new C这个Entry的时候，C的next属性指向了B。图2，然后把C放在0x08处，图3。
这里写图片描述

实际上，我们画出的典型HashMap的Capacity是8，在put C之前它的size就已经是6了，如前文所述，阈值=Capacity*加载因子，那么阈值= 8 *0.75=6。put C时候会出发扩容方法doubleCapacity。

private HashMapEntry<K, V>[] doubleCapacity() {
       HashMapEntry<K, V>[] oldTable = table;
       int oldCapacity = oldTable.length;
       if (oldCapacity == MAXIMUM_CAPACITY) {
           return oldTable;
       }
       int newCapacity = oldCapacity * 2;
       HashMapEntry<K, V>[] newTable = makeTable(newCapacity);
       if (size == 0) {
           return newTable;
       }

       for (int j = 0; j < oldCapacity; j++) {
           /*
            * Rehash the bucket using the minimum number of field writes.
            * This is the most subtle and delicate code in the class.
            */
           HashMapEntry<K, V> e = oldTable[j];
           if (e == null) {
               continue;
           }
          //  oldCapacity是2^n所以最高位是1其余位是0，与运算取最高位。
          //  110101010
          //  000001000
          //  000000000
          // 那这个值应该就是 0 或者 16
           int highBit = e.hash & oldCapacity;
           HashMapEntry<K, V> broken = null;
          //  highBit 1000 或者 0000
          //  取 或 运算  然后把e放到这个位置
          // 按理来说，这个index的算法应该是  取余  hash & (newCapacity32 - 1)
          // 和 hash & (oldCapacity16  - 1) 差的只是
           newTable[j | highBit] = e;
          //  处理e位置的 链表
           for (HashMapEntry<K, V> n = e.next; n != null; e = n, n = n.next) {
               int nextHighBit = n.hash & oldCapacity;
               if (nextHighBit != highBit) {
                   if (broken == null)
                       newTable[j | nextHighBit] = n;
                   else
                       broken.next = n;
                   broken = e;
                   highBit = nextHighBit;
               }
           }
           if (broken != null)
               broken.next = null;
       }
       return newTable;
   }

我们知道，Entry在数组中的位置是根据key算出HashCode，然后再用Capacity对其取余。那么当doubleCapacity之后，我们有理由推测在原来table里的数据放到新table的时候，索引值是会发生变化的，所以在重新放的过程中必然会伴随着一个rehash过程，找到新的索引。在当前情况下，key的HashCode是不会变的，变的只是取余过程中的掩码。老实一点的做法，直接取出Entry中的hash属性，然后用新Capacity取余即可。但是Android中没这么做。而是采用了以上做法。
2^n和2^n+1在二进制中只是最高位的1左移了一位。取余运算又变成了与运算，那么被2^n和2^n+1取余之后，就只有最高位不同了。以上算法是先把最高位highBit 取出来，剩下的低位就是原来数组的索引，再把最高位和剩下的低位组合在一起，其效果等于直接对new Capacity取余，就是新index了。这样rehash完成之后，返回新的table，就完成扩容了

下面附上代码和注释

public class HashMap<K, V> extends AbstractMap<K, V> implements Cloneable, Serializable {

    // HashMap的最小容量，必须是2的正整次幂
    private static final int MINIMUM_CAPACITY = 4;

    // HashMap的最大容量（2的三十次方），必须是2的正整次幂
    private static final int MAXIMUM_CAPACITY = 1 << 30;

    // 所有空Map都使用这个空table，
    private static final Entry[] EMPTY_TABLE
            = new HashMapEntry[MINIMUM_CAPACITY >>> 1];
    // 加载因子
    static final float DEFAULT_LOAD_FACTOR = .75F;

    transient HashMapEntry<K, V>[] table;

    transient HashMapEntry<K, V> entryForNullKey;

    transient int size;

    transient int modCount;

    //  阈值，用于判断是否要扩容。默认值是.75 * capacity
    // 比如capacity是100，那这个阈值就是75
    // 也就是说，HashMap装了75个数据之后，就扩容。
    // 此时table被使用了应该是小于等于 75
    private transient int threshold;

    // Views - lazily initialized
    private transient Set<K> keySet;
    private transient Set<Entry<K, V>> entrySet;
    private transient Collection<V> values;

// 数据结构
// 这个Entry是单向链表，也就是发生冲突之后的那个链表。
    static class HashMapEntry<K, V> implements Entry<K, V> {
        final K key;
        V value;
        final int hash;
        HashMapEntry<K, V> next;

        HashMapEntry(K key, V value, int hash, HashMapEntry<K, V> next) {
            this.key = key;
            this.value = value;
            this.hash = hash;
            this.next = next;
        }

        public final K getKey() {
            return key;
        }

        public final V getValue() {
            return value;
        }

        public final V setValue(V value) {
            V oldValue = this.value;
            this.value = value;
            return oldValue;
        }

        @Override public final boolean equals(Object o) {
            if (!(o instanceof Entry)) {
                return false;
            }
            Entry<?, ?> e = (Entry<?, ?>) o;
            return Objects.equal(e.getKey(), key)
                    && Objects.equal(e.getValue(), value);
        }

        @Override public final int hashCode() {
            return (key == null ? 0 : key.hashCode()) ^
                    (value == null ? 0 : value.hashCode());
        }

        @Override public final String toString() {
            return key + "=" + value;
        }
    }

    //  构造函数，构造一个空的HashMap
    @SuppressWarnings("unchecked")
    public HashMap() {
        table = (HashMapEntry<K, V>[]) EMPTY_TABLE;
        // 第一次put数据的时候，强迫替代这个空table
        threshold = -1; // Forces first put invocation to replace EMPTY_TABLE
    }
// 指定容量的构造函数
    public HashMap(int capacity) {
        if (capacity < 0) {
            throw new IllegalArgumentException("Capacity: " + capacity);
        }

        if (capacity == 0) {
          // 如果容量为0，构造一个空的HashMap
            @SuppressWarnings("unchecked")
            HashMapEntry<K, V>[] tab = (HashMapEntry<K, V>[]) EMPTY_TABLE;
            table = tab;
            threshold = -1; // Forces first put() to replace EMPTY_TABLE
            return;
        }
//      容量不能越界，且必须为2的整次幂
        if (capacity < MINIMUM_CAPACITY) {
            capacity = MINIMUM_CAPACITY;
        } else if (capacity > MAXIMUM_CAPACITY) {
            capacity = MAXIMUM_CAPACITY;
        } else {
            capacity = Collections.roundUpToPowerOfTwo(capacity);
        }
        makeTable(capacity);
    }

// 指定容量和加载因子，但是指定也白指定，因为忽略了loadFactor
// 也就是加载因子永远都是0.75，这样做简化了代码，提高了程序性能。
    public HashMap(int capacity, float loadFactor) {
        this(capacity);

        if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
            throw new IllegalArgumentException("Load factor: " + loadFactor);
        }

    }

    static int capacityForInitSize(int size) {
        int result = (size >> 1) + size; // Multiply by 3/2 to allow for growth

        // boolean expr is equivalent to result >= 0 && result<MAXIMUM_CAPACITY
        return (result & ~(MAXIMUM_CAPACITY-1))==0 ? result : MAXIMUM_CAPACITY;
    }


    @Override public V put(K key, V value) {
        if (key == null) {
            return putValueForNullKey(value);
        }
//      算出key的Hash值，然后把这个值用table长度减一取余得出索引
//      容量必须是2的幂次方是有原因的，这样就可以将取余%运算变成与 & 运算了
        int hash = Collections.secondaryHash(key);
        HashMapEntry<K, V>[] tab = table;
        int index = hash & (tab.length - 1);
        // 这个位置有可能因为冲突存在单链表，所以需要对单链表索引
        // 如果key相同，就换掉它的value。
        // 如果遍历结束没有相同的key，就执行插入操作，也就是addNewEntry。
        for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
            if (e.hash == hash && key.equals(e.key)) {
                preModify(e);
                V oldValue = e.value;
                e.value = value;
                return oldValue;
            }
        }

        // No entry for (non-null) key is present; create one
        modCount++;
        if (size++ > threshold) {
            tab = doubleCapacity();
            index = hash & (tab.length - 1);
        }
        // 插入一个Entry，实际上是插入在了单链表的表头。
        addNewEntry(key, value, hash, index);
        return null;
    }

    private V putValueForNullKey(V value) {
        HashMapEntry<K, V> entry = entryForNullKey;
        if (entry == null) {
            addNewEntryForNullKey(value);
            size++;
            modCount++;
            return null;
        } else {
            preModify(entry);
            V oldValue = entry.value;
            entry.value = value;
            return oldValue;
        }
    }

  /**
  HashMapEntry的构造方法
     HashMapEntry(K key, V value, int hash, HashMapEntry<K, V> next) {
         this.key = key;
         this.value = value;
         this.hash = hash;
         this.next = next;
     }
     */
    // 这样的话，是插入在链表首部
    // 即便是当前数组那里存在数据，也可以加入，解决了冲突。
    void addNewEntry(K key, V value, int hash, int index) {
        table[index] = new HashMapEntry<K, V>(key, value, hash, table[index]);
    }

// 通过key去查找value
    public V get(Object key) {
        if (key == null) {
            HashMapEntry<K, V> e = entryForNullKey;
            return e == null ? null : e.value;
        }

        int hash = Collections.secondaryHash(key);
        HashMapEntry<K, V>[] tab = table;
        for (HashMapEntry<K, V> e = tab[hash & (tab.length - 1)];e != null; e = e.next) {
            K eKey = e.key;
            // 如果key的地址相同，或者hash相同。
            // 疑问 hash对比和equals有什么区别?
            if (eKey == key || (e.hash == hash && key.equals(eKey))) {
                return e.value;
            }
        }
        return null;
    }



    /**
     * Doubles the capacity of the hash table. Existing entries are placed in
     * the correct bucket on the enlarged table. If the current capacity is,
     * MAXIMUM_CAPACITY, this method is a no-op. Returns the table, which
     * will be new unless we were already at MAXIMUM_CAPACITY.
     */
    //  容量扩大2倍
    private HashMapEntry<K, V>[] doubleCapacity() {
        HashMapEntry<K, V>[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            return oldTable;
        }
        // 创建一个容量翻倍的table
        int newCapacity = oldCapacity * 2;
        HashMapEntry<K, V>[] newTable = makeTable(newCapacity);
        if (size == 0) {
            return newTable;
        }

        for (int j = 0; j < oldCapacity; j++) {
            /*
             * Rehash the bucket using the minimum number of field writes.
             * This is the most subtle and delicate code in the class.
             */
            //  注释说，接下来的一段代码是整个类的精华
            HashMapEntry<K, V> e = oldTable[j];
            if (e == null) {
                continue;
            }
            // 原来的索引
            int highBit = e.hash & oldCapacity;
            HashMapEntry<K, V> broken = null;
            newTable[j | highBit] = e;
            for (HashMapEntry<K, V> n = e.next; n != null; e = n, n = n.next) {
                int nextHighBit = n.hash & oldCapacity;
                if (nextHighBit != highBit) {
                    if (broken == null)
                        newTable[j | nextHighBit] = n;
                    else
                        broken.next = n;
                    broken = e;
                    highBit = nextHighBit;
                }
            }
            if (broken != null)
                broken.next = null;
        }
        return newTable;
    }


    /**
     * Creates a new entry for the null key, and the given value and
     * inserts it into the hash table. This method is called by put
     * (and indirectly, putAll), and overridden by LinkedHashMap.
     */
    void addNewEntryForNullKey(V value) {
        entryForNullKey = new HashMapEntry<K, V>(null, value, 0, null);
    }

    /**
     * Like newEntry, but does not perform any activity that would be
     * unnecessary or inappropriate for constructors. In this class, the
     * two methods behave identically; in LinkedHashMap, they differ.
     */
    HashMapEntry<K, V> constructorNewEntry(
            K key, V value, int hash, HashMapEntry<K, V> first) {
        return new HashMapEntry<K, V>(key, value, hash, first);
    }

    /**
     * Copies all the mappings in the specified map to this map. These mappings
     * will replace all mappings that this map had for any of the keys currently
     * in the given map.
     *
     * @param map
     *            the map to copy mappings from.
     */
    @Override public void putAll(Map<? extends K, ? extends V> map) {
        ensureCapacity(map.size());
        super.putAll(map);
    }


    //  这个方法只被 putAll调用了，
    // 它的容量不是翻倍，而是扩大到刚好装下newmap的2的整次幂
    private void ensureCapacity(int numMappings) {
        int newCapacity = Collections.roundUpToPowerOfTwo(capacityForInitSize(numMappings));
        HashMapEntry<K, V>[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (newCapacity <= oldCapacity) {
            return;
        }
        if (newCapacity == oldCapacity * 2) {
            doubleCapacity();
            return;
        }

        // We're growing by at least 4x, rehash in the obvious way
        HashMapEntry<K, V>[] newTable = makeTable(newCapacity);
        if (size != 0) {
            int newMask = newCapacity - 1;
            for (int i = 0; i < oldCapacity; i++) {
                for (HashMapEntry<K, V> e = oldTable[i]; e != null;) {
                    HashMapEntry<K, V> oldNext = e.next;
                    int newIndex = e.hash & newMask;
                    HashMapEntry<K, V> newNext = newTable[newIndex];
                    newTable[newIndex] = e;
                    e.next = newNext;
                    e = oldNext;
                }
            }
        }
    }


    private HashMapEntry<K, V>[] makeTable(int newCapacity) {
        @SuppressWarnings("unchecked") HashMapEntry<K, V>[] newTable
                = (HashMapEntry<K, V>[]) new HashMapEntry[newCapacity];
        table = newTable;
        threshold = (newCapacity >> 1) + (newCapacity >> 2); // 3/4 capacity
        return newTable;
    }

    @Override public V remove(Object key) {
        if (key == null) {
            return removeNullKey();
        }
        int hash = Collections.secondaryHash(key);
        HashMapEntry<K, V>[] tab = table;
        int index = hash & (tab.length - 1);
        for (HashMapEntry<K, V> e = tab[index], prev = null;
                e != null; prev = e, e = e.next) {
            if (e.hash == hash && key.equals(e.key)) {
                if (prev == null) {
                    tab[index] = e.next;
                } else {
                    prev.next = e.next;
                }
                modCount++;
                size--;
                postRemove(e);
                return e.value;
            }
        }
        return null;
    }

    private V removeNullKey() {
        HashMapEntry<K, V> e = entryForNullKey;
        if (e == null) {
            return null;
        }
        entryForNullKey = null;
        modCount++;
        size--;
        postRemove(e);
        return e.value;
    }

    //  把table里都装上null
    @Override public void clear() {
        if (size != 0) {
            Arrays.fill(table, null);
            entryForNullKey = null;
            modCount++;
            size = 0;
        }
    }

}

windcake

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap是如何组织数据的

HashMap简介HashMap存储数据采用了key-value形式，采用数组存储key的HashCode，来实现对数据的存储，这样每提供一个key就能知道它在数组中的位置，而不用像ArrayList一样去遍历。但是不同的key算出的HashCode在一个很小的概率上可能相同，即发生了Hash冲突。HashMap采用了链表法来解决这个冲突。所以HashMap的底层数据结构是数组加链表，链表是单链表。
复制链接

扫一扫