JDK7中HashMap源码分析

最新推荐文章于 2022-07-15 23:34:55 发布

morris131

最新推荐文章于 2022-07-15 23:34:55 发布

阅读量884

点赞数 1

分类专栏：多线程与高并发文章标签：链表 java hashmap hashtable 数据结构

本文链接：https://blog.csdn.net/u022812849/article/details/114840411

版权

多线程与高并发专栏收录该内容

26 篇文章 10 订阅

订阅专栏

由于目前大部分都是使用JDK8进行开发，JDK7 HashMap的源码不好找，这里贴上JDK7 HashMap的源码在线阅读地址：__java7 HashMap__，方便大家学习。

数据结构

HashMap内部采用数组+单向链表的数据结构来实现。HashMap内部维护一个数组，然后数组中每个元素是一个个单向链表。

在这里插入图片描述

HashMap的几个关键属性：

static final int DEFAULT_INITIAL_CAPACITY = 16; // 默认的初始容量是16，必须是2的幂。
static final int MAXIMUM_CAPACITY = 1 << 30; // 最大容量（必须是2的幂且小于2的30次方，传入容量过大将被这个值替换）
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE; // 存储元素的数字，默认为{}
transient int size; // 数组的长度
int threshold; // 扩容的阈值，等于capacity*loadFactor
final float loadFactor; // 负载因子，默认为0.75。
transient int modCount; // 记录数组被改变的次数，用于实现fail-fast机制

再看一下数组的元素Entry的属性：

static class Entry<K, V> implements Map.Entry<K, V> {
    final K key;
    V value;
    Entry<K, V> next; 
// 下一个节点
    int hash;
 // 当前key的hash值

构造方法

HashMap有多个重载的构造方法，最后都会调用到下面这个构造方法中来：

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                loadFactor);

    this.loadFactor = loadFactor;
 // 默认0.75f
    threshold = initialCapacity;
 // 默认为16
    init();
 // 空实现
}

构造方法中只是初始化了loadFactor和threshold，并没有初始化数组，在第一次添加元素时才初始化数组。

put()

public V put(K key, V value) {
    // 当插入第一个元素的时候，需要先初始化数组大小，懒初始化
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    // 如果 key 为 null，将这个 entry 放到 table[0] 中
    if (key == null)
        return putForNullKey(value);

    // 求 key 的 hash 值
    int hash = hash(key);

    // 找到对应的数组下标
    int i = indexFor(hash, table.length);

    // 遍历一下对应下标处的链表，看key是否存在，如果有，直接覆盖并返回旧值
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    // key不存在，将此 entry 添加到链表头部
    addEntry(hash, key, value, i);
    return null;
}

inflateTable()

数组的初始化：

private void inflateTable(int toSize) {
    // 保证数组大小一定是2的n次方。比如这样初始化：new HashMap(20)，那么处理成初始数组大小是32
    int capacity = roundUpToPowerOf2(toSize);

    // 计算扩容阈值：capacity * loadFactor
，一开始为16*0.75=12
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);

    // 初始化数组
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

indexFor()

计算key在数组的位置。

static int indexFor(int h, int length) {
    // 在数组大小是2的n次方情况下，h&length-1相当于h%length，位运算更快
    return h & (length-1);
}

addEntry()

添加节点到链表中（头插法）。

// 如果当前数组大小已经达到了阈值，并且新值要插入的数组位置已经有元素了，那么要扩容
void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        // 扩容为数组长度的两倍
        resize(2 * table.length);\
        // 扩容后重新计算hashcode
        hash = (null != key) ? hash(key) : 0;
        // 扩容后重新计算数组下标
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}

// 将新值放到链表的表头，然后 size++
void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

resize()

扩容为原来数组长度的两倍，扩容时链表上的元素会反转。

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    Entry[] newTable = new Entry[newCapacity];
 // 创建新数组
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
 // 旧数组的数组添加到新数组中
    table = newTable;
 // 新数组替换旧数组
    threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K, V> e : table) {
        while (null != e) {
            Entry<K, V> next = e.next;
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
 // 获取key在数组中的下标
            e.next = newTable[i];
 // 插入到链表的头结点
            newTable[i] = e;
            e = next;
        }
    }
}

get()

public V get(Object key) {
    // key=null，会被放到table[0]，所以只要遍历下table[0]处的链表就可以了
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}

// 获取key为null的value
private V getForNullKey() {
    if (size == 0) {
        return null;
    }
    for (Entry<K,V> e = table[0]; e != null; e = e.next) {
        if (e.key == null)
            return e.value;
    }
    return null;
}

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    // 计算hash
    int hash = (key == null) ? 0 : hash(key);

    // 确定数组下标，然后从头开始遍历链表，直到找到为止
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

死循环问题

在多线程的情况下，resize()会发生死锁，因为如果两个线程都发现HashMap需要重新调整大小了，它们会同时试着调整大小。在调整大小的过程中，存储在链表中的元素的次序会反过来，因为移动到新的bucket位置的时候，HashMap并不会将元素放在链表的尾部，而是放在头部，这是为了避免尾部遍历。如果条件竞争发生了，那么就会产生死循环了。

下面演示一下死锁的发生过程：

假设现在HashMap的结构如下：

在这里插入图片描述

下面再回顾一下扩容的源码：

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K, V> e : table) {
        while (null != e) {
            Entry<K, V> next = e.next; // #1
            if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            int i = indexFor(e.hash, newCapacity);
            e.next = newTable[i];
            newTable[i] = e;
            e = next;
        }
    }
}

假设现在线程A执行到#1代码处时，让出CPU执行权，线程B开始执行扩容并完成，此时HashMap的结构如下（假设扩容后新的索引都为2）：

在这里插入图片描述

然后线程A开始执行，此时e=a，e.next=b，然后执行下面三行代码：

e.next = newTable[i];
newTable[i] = e;
e = next;

在这里插入图片描述

此时已形成循环链表，最后e=e.next，下轮循环开始时e=b，e.next=a，这样下去e.next永远不可能为空，就会陷入死循环。

如何解决HashMap的线程不安全问题？

使用Hashtable，不推荐使用，Hashtable内部是直接在所有方法上面添加synchronized实现同步，保证线程安全，造成锁的颗粒度太大，不适合高并发场景。
将HashMap进行同步包装Collections.SynchronizedMap()，其底层与Hashtable类似。
推荐使用ConcurrentHashMap。

morris131

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
JDK7中HashMap源码分析

由于目前大部分都是使用JDK8进行开发，JDK7 HashMap的源码不好找，这里贴上JDK7 HashMap的源码在线阅读地址：__java7 HashMap__，方便大家学习。数据结构HashMap内部采用数组+单向链表的数据结构来实现。HashMap内部维护一个数组，然后数组中每个元素是一个个单向链表。HashMap的几个关键属性：static final int DEFAULT_INITIAL_CAPACITY = 16; // 默认的初始容量是16，必须是2的幂。static fina
复制链接

扫一扫