JAVA基础 - hashMap(JDK1.7)

最新推荐文章于 2024-09-23 15:50:28 发布

IT_peng

最新推荐文章于 2024-09-23 15:50:28 发布

阅读量278

点赞数

分类专栏： java 基础文章标签： java

本文链接：https://blog.csdn.net/IT_peng/article/details/103099847

版权

java 基础专栏收录该内容

11 篇文章 0 订阅

订阅专栏

这些问题你都知道了吗？

hashMap 存储的和遍历出来的值顺序是否一致？
key == null 怎么存储。
每次容量增大多少
加载因子
hashCode
数据分布随机性
并发问题
- 死链问题
- 数据丢失问题

定义的常亮或变量

  /**
     * The default initial capacity - MUST be a power of two.
     * 必须为2的幂次方
     */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

    /**
     * The load factor used when none specified in constructor.
     * 负载因子 默认0.75
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    /**
     * An empty table instance to share when the table is not inflated.
     * 空 数组
     */
    static final Entry<?, ?>[] EMPTY_TABLE = {};

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     * 长度 必须为 2的幂次方
     * <p>
     * table表数组
     */
    transient Entry<K, V>[] table = (Entry<K, V>[]) EMPTY_TABLE;

    /**
     * The number of key-value mappings contained in this map.
     * 所有 bucket 的个数。
     */
    transient int size;

    /**
     * The next size value at which to resize (capacity * load factor).
     *
     * @serial
     */
    // If table == EMPTY_TABLE then this is the initial capacity at which the
    // table will be created when inflated.

    /**
     * 临界值
     */
    int threshold;

    /**
     * The load factor for the hash table.
     * <p>
     * 负载因子
     *
     * @serial
     */
    final float loadFactor;

分析会使用到的

位运算：
参考：https://zhuanlan.zhihu.com/p/30108890

左移运算符<<，
	丢弃左边指定位数，右边补0。

右移运算符：>>
	丢弃右边指定位数，左边补上符号位。

无符号右移运算符>>>
	丢弃右边指定位数，左边补上0

位运算符（^）
	如果相对应位值相同，则结果为0，否则为1

位与运算符（&）
	运算规则：两个数都转为二进制，然后从高位开始比较，如果两个数都为1则为1，		  否则为0。
比如：129&128.
	129转换成二进制就是10000001，128转换成二进制就是10000000。从高位开始比较得到，得到10000000，即128.

使用用法：

  public static void main(String[] args) {
        /**
         *  声明1个 HashMap的对象
         */
        HashMap<String, Integer> map = new HashMap<String, Integer>();


        /**
         * 2. 向HashMap添加数据（成对 放入 键 - 值对）
         */
        map.put("Android1 - ", 1);
        map.put("Android2 - ", 2);
        map.put("Android3 - ", 3);
        map.put("Android4 - ", 4);
        map.put("Android5 - ", 5);


        Set<Map.Entry<String, Integer>> entrySet = map.entrySet();

       
        for(Map.Entry<String, Integer> entry : entrySet){
            System.out.print(entry.getKey());
            System.out.println(entry.getValue());
        }
	}
}


 // 打印结果  (和我们存储的顺序不一致)
  Android3 - 3
  Android1 - 1
  Android4 - 4
  Android5 - 5
  Android2 - 2

我们看HashMap() 如何实例的


public HashMap() {
		// DEFAULT_INITIAL_CAPACITY  == 16  初始化值
		// DEFAULT_LOAD_FACTOR == 0.75   加载因子
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }

 public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        this.loadFactor = loadFactor;
        threshold = initialCapacity;
        init();
    }

这里说一下为什么加载因子作用：

加载因子越大，填满的元素越多，空间利用率越高，但冲突的机会加大了。
反之,加载因子越小，填满的元素越少，冲突的机会减小，但空间浪费多了。
冲突的机会越大，则查找的成本越高。反之，查找的成本越小。
因此,必须在 "冲突的机会"与"空间利用率"之间寻找一种平衡与折衷

hashMap put 函数是如何工作的呢？

 public V put(K key, V value) {
 		
        if (table == EMPTY_TABLE) {
        // 第一步  初始化 table
            inflateTable(threshold);
        }
        // 第二步：当 key = null 的一个特殊操作。
        if (key == null)
            return putForNullKey(value);
        // 正常put 值得 操作    
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
		/**
			多线程下有问题 
		*/
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

那我们先看第一步. (已经把注释写的很清楚了)
说明了 hashMap 并不是 new HashMap() 时候就初始化了

  /**
     * Inflates the table.
     */
    private void inflateTable(int toSize) {
        // Find a power of 2 >= toSize
        int capacity = roundUpToPowerOf2(toSize);

        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        // 初始化 table 长度
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);
    }

 	// 保证数组大小一定是 2 的 n 次方。
    // 比如这样初始化：new HashMap(20)，那么处理成初始数组大小是 32
 private static int roundUpToPowerOf2(int number) {
        // assert number >= 0 : "number must be non-negative";
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
    }



/**
     * Initialize the hashing mask value. We defer initialization until we
     * really need it.
     */
     /**
     * 按需初始化哈希种子
     * 参考:https://segmentfault.com/a/1190000018520768
     */
    final boolean initHashSeedAsNeeded(int capacity) {
        // 如果hashSeed != 0，表示当前正在使用备用哈希
        boolean currentAltHashing = hashSeed != 0;
        // 如果vm启动了且map的容量大于阈值，使用备用哈希
        boolean useAltHashing = sun.misc.VM.isBooted() &&
                (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        // 异或操作，相同 == fase  不同等于 == true 。
        boolean switching = currentAltHashing ^ useAltHashing;
        if (switching) {
            // 把hashSeed设置成随机值
            hashSeed = useAltHashing
                    ? sun.misc.Hashing.randomHashSeed(this)
                    : 0;
        }
        return switching;
    }

这个地方需要对 roundUpToPowerOf2 函数进行特殊说明

其实这是为了保证通过hash方式获取下标的时候分布均匀。数组长度为2的n次幂的时候，不同的key 算得得 index 相同的几率较小，那么数据在数组上分布就比较均匀，也就是说碰撞的几率小，相对的，查询的时候就不用遍历某个位置上的链表，这样查询效率也就较高了。

测试一下：

如果初始化的容量 13 14 15 .
计算出来 capacity 长度是多少呢？

		HashMap<String, Integer> map1 = new HashMap<String, Integer>(13);
        map1.put("1", 1);

        HashMap<String, Integer> map2 = new HashMap<String, Integer>(14);
        map2.put("1", 1);

        HashMap<String, Integer> map3 = new HashMap<String, Integer>(15);
        map2.put("1", 1);

 // Capacity  初始化的容量是多少。

 //  都是 16
 
 // 等同于 
        System.out.println(roundUpToPowerOf2(13));
        System.out.println(roundUpToPowerOf2(14));
        System.out.println(roundUpToPowerOf2(15));

看第二步：
能走到这一步直接就可以看出来 key == null。
和其他的有什么不一样的地方呢？

 /**
     * Offloaded version of put for null keys
     * <p>
     */
    private V putForNullKey(V value) {
    	//  putForNullKey #1
        for (Entry<K, V> e = table[0]; e != null; e = e.next) {
        //  如果相等就进行替换.
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

分析 (putForNullKey #1) :
可以看出来
key == null 时候存放的地方是固定的 table[0] 中。

看看我们是怎么添加进去的

    void addEntry(int hash, K key, V value, int bucketIndex) {

        /**
         *  如果  添加的值大于等于 阀的 且 table 块 有值
         *  重新计算大小.
         *
         *  2 * table.length ?  为什么是这样计算方式
         *
         */
        if ((size >= threshold) && (null != table[bucketIndex])) {			
        	//如果 put这个操作 大于等于临界值 threshold  扩容.
        	//  2的幂次方 进行扩容
            resize(2 * table.length);
            // 计算key的哈希码 
            hash = (null != key) ? hash(key) : 0;
            // 哈希码对应的下标。
            bucketIndex = indexFor(hash, table.length);
        }
		
        createEntry(hash, key, value, bucketIndex);
    }


    final int hash(Object k) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }

        h ^= k.hashCode();

        /**
         * 此函数可确保在每个位位置仅相差
         * 恒定倍数的 hashCode 具有有限的冲突次数（默认负载因子为约8）。
         */
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h = h ^ ((h >>> 20) ^ (h >>> 12));
        return h ^ (h >>> 7) ^ (h >>> 4);
    }


  /**
     * Returns index for hash code h.
     * 根据哈希码计算 code  table  下标。
     */
    static int indexFor(int h, int length) {
        // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
        return h & (length - 1);
    }

 void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K, V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        /**
         * 多线程下 有问题
         */
        size++;
    }

如果我们第一次
hashMap.put(null,“value”)
传递值是
createEntry(0,null ,value ,0)
table[0]= new Entry(0,null,value,Entry)

先讲一下耗费我时间最长的地方。
(这个我想应该是 java 1.7 hashMap 精髓之处)

  hash(Object k)   函数
  
//and
 /**
     * 简单说就是取 hash 值的低 n 位。
     * 如在数组长度为 32 的时候，
     * 其实取的就是 key 的 hash 值的低 5 位，
     * 作为它在数组中的下标位置。
     */

/**
	为什么不直接 % 这个疑问
*/
 indexFor(int h, int length)

// 细说应该是 三行
hash(Object k)- > {
		h ^= k.hashCode();
      	h = h ^ ((h >>> 20) ^ (h >>> 12));
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

经过在网上搜索，知道了。

扰动函数

但是为什么叫扰动函数。

扰动了在哪里了？
如果不扰动有怎么了怎么hashMap 源码到处都有 hash 函数的出现。

如果不扰动的发生什么？
图片来源：https://www.cnblogs.com/jajian/p/10385063.html
参考 https://www.cnblogs.com/jajian/p/10385063.html

图片部位	描述
黄色框	table
黄色框内部框	哈希槽
竖直的框连起来	哈希桶

如果大家觉得 hash 函数看起来烦, 那我们可以不可以去掉？
我觉得是可以的
我觉得是可以的
我觉得是可以的

  // 去掉 hash 就是这样  
 indexFor(key.hashCode(), table.length)

看一组实例：
以下参考：https://www.hollischuang.com/archives/2091

6 & 7 = 6
10 & 7 = 2

在这里插入图片描述
再来一组数据

看出来什么了吗？

或者我们更彻底一下改造一下 indexFor 让我更容易看懂的形式

static int indexFor(int h, int length) {
		return h%length;
    }
    
 // 给一组数据：  都会落在 同一个 哈希槽 中。  
 12 % 16 =12
 28 % 16 =12
 108 % 16 =12
 140 % 16 =12

专业词：哈希碰撞

碰撞太过于明显。

第一个实例就是低位参与了 , 高位并没有参与。特征并没有完全发挥出来。

java 引入 hash() 函数意义是：

就是为了把高位的特征和低位的特征组合起来，降低哈希冲突的概率，也就是说，尽量做到任何一位的变化都能对最终得到的结果产生影响

经过扰动的算法最终的计算结果会如何。
在这里插入图片描述

把这些都弄明白了

看看如果 put 时候发生了扩容咋整？

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        transfer(newTable, initHashSeedAsNeeded(newCapacity));

        table = newTable;

        /**
         *  transfer 重新计算阀值
         */
        threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

    /**
     * Transfers all entries from current table to newTable.
     * 将所有条目从当前表转移到newTable
     * 不明白
     */
    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
      // 下面 有一个专业的说法就是头插入法   哈希桶   1 2  3  4   ->  4 3 2 1 
        for (Entry<K, V> e : table) {
            while (null != e) {
                Entry<K, V> next = e.next;
                /**
                 * 重新计算它的hash值
                 */
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                /**
                 *  在根据 hash 值 进行定位操作。
                 */
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

先说结论
在多线程会丢失数据和死锁。
用事实说话：

import java.util.HashMap;

/**
 *  数据丢失 demo
 */
public class HashMapTest3 {


    /**
     *  reSize
     *
     */
    public static void main(String[] args) throws InterruptedException {

        HashMapEndLessLoop test = new HashMapEndLessLoop();
        test.goTest();
        Thread.sleep(100000);

    }


    public static class HashMapEndLessLoop {
        private HashMap<Long, EasyCoding> map = new HashMap<Long, EasyCoding>();

        public void goTest() {
            for (int i = 0; i < 500; i++) {
                final int s = i;
                (new Thread() {
                    public void run() {
                        map.put(System.nanoTime(), new EasyCoding());
                        System.out.println(s);
                    }
                }).start();
            }
        }
    }

    static class EasyCoding {

    }

}