HashMap详解

最新推荐文章于 2024-06-15 11:53:54 发布

年少峰

最新推荐文章于 2024-06-15 11:53:54 发布

阅读量484

点赞数

分类专栏： java 数据结构文章标签： java 集合类 Hash

本文链接：https://blog.csdn.net/u011475873/article/details/46442049

版权

java 同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

数据结构

9 篇文章 0 订阅

订阅专栏

最近一段时间做题经常遇到HashMap，一开始只是知道他是用来存放键值映射<K,V>的集合类，也便拿来就用了。随着了解的深入，也接触了他的源码，也了解了他的底层结构，觉得对他的理解也深刻了，在此，综合各类资源，谈下自己的理解。

HashMap是基于hash表的Map接口的实现，继承于AbstractMap类，可以提供所有的可选映射操作，允许null键和null值。JDK中有提到，此类不保证映射的顺序，特别是他不保证顺序的恒久不变，至于为什么我也不懂，姑且听之。

AbstractMap

AbstractMap提供了Map接口的骨干应用，以最大限度实现了此接口所需的工作。此类提供了几个基本方法，如containsValue(Object value),containsKey(Object key),get(Object key),remove(Object key),clear()等等，其中containsValue(Object value),containsKey(Object key),remove(Object key)等都是通过遍历来实现的，比较简单，get(Object key), clear()是通过返回entry视图来实现的。代表源码如下：

    public boolean containsKey(Object key) {
        Iterator<Map.Entry<K,V>> i = entrySet().iterator();
        if (key==null) {
            while (i.hasNext()) {
                Entry<K,V> e = i.next();
                if (e.getKey()==null)
                    return true;
            }
        } else {
            while (i.hasNext()) {
                Entry<K,V> e = i.next();
                if (key.equals(e.getKey()))
                    return true;
            }
        }
        return false;
    }

    public void clear() {
        entrySet().clear();
    }

在AbstractMap中，put方法是没有实现的，编程人员需要重写此方法，否则就回抛出UnsupportedOperationException()异常。

    public V put(K key, V value) {
        throw new UnsupportedOperationException();
    }

另外在此类中，还有两个关键词transient和volatile。

transient Set<K>        keySet = null;
transient volatile Collection<V> values = null;

transient是java语言的关键词，变量修饰符，如果用transient声明一个实例变量，当对象存储时，他的值不需要被支持。当对象在串行化的时候（二进制数据流传输）。被transient修饰的变量就不会传输的。详见点击打开链接，点击打开链接。

volatile是java中的类型修饰符，一般用在多线程中。被volatile修饰，此条指令不会因编译器的优化而忽略，每次都会直接读值。并且编译器将不会把变量保存到寄存器中，而是每一次都去访问内存中实际保存该变量的位置上。详见点击打开链接。

下面我们重点讨论下HashMap。

HashMap

HashMap数据结构

HashMap是基于hash表的Map实现。HashMap在底层的存储是一个数组和若干链表组成的。下图是HashMap的数据结构图。

从上图中可以看到，左边纵向的是一个Entry数组，Entry中的元素见上右图，包括key，value，hash值和一个指向下一个横向entry的引用，next指向横向的下一个entry，是用来处理hash冲突的。Entry源码如下：

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        final int hash;
        Entry<K,V> next;
..........
}

先看HashMap中的一些重要变量：

    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
    /**
     * The number of key-value mappings contained in this map.
     */
    transient int size;
    /**
     * The next size value at which to resize (capacity * load factor).
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.
    int threshold;
    /**
     * The load factor for the hash table.
     *
     * @serial
     */
    final float loadFactor;
    /**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     */
    transient int modCount;

其中table是用来存放entry的数组，size是Map中的映射个数，threahold是阈值，阈值为加载因子*容量，当实际大小超过临界值，就要扩容，扩容是容量的2倍。

        if (numKeysToBeAdded > threshold) {
            int targetCapacity = (int)(numKeysToBeAdded / loadFactor + 1);
            if (targetCapacity > MAXIMUM_CAPACITY)
                targetCapacity = MAXIMUM_CAPACITY;
            int newCapacity = table.length;
            while (newCapacity < targetCapacity)
                newCapacity <<= 1;
            if (newCapacity > table.length)
                resize(newCapacity);
        }

加载因子是表示HashMap中元素的填满程度。容易理解，让加载因子越大，那么空间利用率明显就高了，但是这样，出现冲突的几率就明显大了，而且链表长度变大，搜索起来也更费时了；加载因子越小，空间利用率下去了，不容易出现冲突，但是空间就浪费了，所以，在加载因子和容量中要取个折中的值，目的是使这些可能发生的意外尽量最小。默认的加载因子是0.75.

HashMap的构造方法

再看HashMap的构造方法，源码中的构造方法一共有4个：

   public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }
<h2>
</h2>    /**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs a new <tt>HashMap</tt> with the same mappings as the
     * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified <tt>Map</tt>.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                      DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
        inflateTable(threshold);

        putAllForCreate(m);
    }

根据不同的情况调用不同的构造方法，比如指定初始容量和加载因子就用第一个，并且可以看到Map的容量会始终是2的n次幂。

存取数据

HashMap中最常用的无非就是存取数据了。首先看看存数据。

    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

分析put方法时，可以看到键值为null时，调用putForNullKey方法。

private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {   
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
           }
       }
        modCount++;
        addEntry(0, null, value, 0); 
        return null;
    }

putForNullKey方法中，可以看到，当Map中存在null键时，就将现在value覆盖之前的value，如果原来Map不存在null键时，Map的长度加1，调用addEntry方法。分析源码可以看到，就是把null键的vlaue设为0，并且放在table[0]处。

    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

返回put函数，当key不为空时，要通过hash函数计算key的哈希值；好复杂的数学运算，表示没看懂

   final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }


        h ^= k.hashCode();


        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

hash计算出来之后，通过indexFor函数计算数组的索引

    /**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length-1);
    }

通常取模运算是用来定为数组中的位置，是对length取模，HashTable中就是这样做的，这样是为了使元素分布更加均匀，但是除法在计算机中运行效率很低。而这里是通过&运算，通过查找资料知道，这样不仅可以是元素分布均匀，而且效率大大的上去了。这里研究了好久，最后终于理解了。前面有提到过Map的容量会始终是2的n次幂，这里解释下。如果 length是2的n次幂，那这里的&运算就是对length取模，上述取模的优点这里都能保证，其次，length为2的n次幂，必为偶数，那么-1必为奇数，那么取模后，得到的结果即可能为奇数，也可能为偶数，这样就可以更好保证散列的均匀。举个例子：

假设两个数组长度分别为16和15，得到的hash值分别问9和8

h & （length-1） hash & length-1

8&(15-1) 1000 & 1110 = 1000

9&(15-1) 1001 & 1110 = 1000

8&(16-1) 1000 & 1111 = 1000

9&(16-1) 1001 & 1111 = 1001

可以看到，与数组长度为15时，明显发生了碰撞，即只能保存在相同index的链表中，而与数组长度为16运算是，index不同，可以都存放在数组中，不用去遍历链表去查找，搜索效率明显变快了。

返回put函数，找到了index后，再去遍历index的链表，看看是否存在hash和key都相同的映射，如果存在，就覆盖旧值并返回旧值，否则，就添加映射对。

读取数据：

JDK中get方法如下：

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

可以看到这里面重写了Map.Entry接口中的getEntry方法。

    final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
      int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

可以看到，首先是计算key的hash值，找到数组中的index，在通过比较key值，返回对应的value值。

以上就是HashMap中的数据存取。

另外，在我们现实编程中，HashMap中的<key,value>可以是任意数据类型甚至是我们自己定义的对象，但是此时，我们必须重写HashMap中的两个重要方法，计算hash值的hashcode函数和比较key值的equal函数。比如两个student对象，hashcode就需要得到的是他们独一无二的特性，比如学号，而equal函数就需要根据特定需求来重写。

在JDK1.6中有提到，HashMap的实现不是同步的。如果多个线程同时访问一个哈希映射，而其中至少一个线程从结构上修改了该映射，则它必须保持外部同步。（结构上的修改是指添加或删除一个或多个映射关系的任何操作；仅改变与实例已经包含的键关联的值不是结构上的修改。）这一般通过对自然封装该映射的对象进行同步操作来完成。如果不存在这样的对象，则应该使用 Collections.synchronizedMap 方法来“包装”该映射。最好在创建时完成这一操作，以防止对映射进行意外的非同步访问，如下所示：

   Map m = Collections.synchronizedMap(new HashMap(...))。

年少峰

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HashMap详解

最近一段时间做题经常遇到HashMap，一开始只是知道他是用来存放键值映射的集合类，也便拿来就用了。随着了解的深入，也接触了他的源码，也了解了他的底层结构，觉得对他的理解也深刻了，在此，总和各类资源，谈下自己的理解。HashMap是基于hash表的Map接口的实现，继承于AbstractMap类，可以提供所有的可选映射操作，允许null键和null值。JDK中有提到，此类不保证映射的顺序，特别
复制链接

扫一扫

专栏目录