集合专辑(七)：HashTable核心源码解读

最新推荐文章于 2024-10-15 15:06:01 发布

左灯右行的爱情

最新推荐文章于 2024-10-15 15:06:01 发布

阅读量160

点赞数

分类专栏：集合文章标签： java 哈希算法 jvm

本文链接：https://blog.csdn.net/qq_45852626/article/details/125968262

版权

集合专栏收录该内容

9 篇文章 0 订阅

订阅专栏

简单介绍

HashTable特点

1：键和值都不允许为null
2：使用方法和HashMap基本一样
3：hashTable是线程安全的，hashMap是线程不安全的

继承结构关系

在这里插入图片描述
代码实现如下：

public class Hashtable<K,V>
    extends Dictionary<K,V>
    implements Map<K,V>, Cloneable, java.io.Serializable

DIctionary：表示键值存储库，操作与Map相似，但是已经过时了，在java的后续发展中，逐渐被Map取代了。

Doc解读

/**
 * This class implements a hash table, which maps keys to values. 
 * 这个类实现了Hash 表，它将key映射到值上
   Any non-<code>null</code> object can be used as a key or as a value. 
 *任何非空的对象都可以作为key或者value使用（换言之不允许有null出现）
 * To successfully store and retrieve objects from a hashtable, the
 * objects used as keys must implement the <code>hashCode</code>
 * method and the <code>equals</code> method. <p>
 *为了成功地从hash表中存储和检索对象，作为key的对象必需实现hashcode和equals方法。
 * An instance of <code>Hashtable</code> has two parameters that affect its
 * performance: <i>initial capacity</i> and <i>load factor</i>. 
 * 一个Hashtable的实力要有两个参数：初始容量和负载因子
 *  The <i>capacity</i> is the number of <i>buckets</i> in the hash table, and the
 * <i>initial capacity</i> is simply the capacity at the time the hash table
 * is created.  
 * 容量Capacity是hash表中桶的数量，初始容量则是在创建hash表的时候同时创建的
 * Note that the hash table is <i>open</i>: in the case of a "hash
 * collision", a single bucket stores multiple entries, which must be searched
 * sequentially.  
 * 需要注意的是，hash表是开放状态的，在hash冲突情况下，一个桶存储了多个条目，那么必需按顺序对这些条目进行搜索
 * 
 * The <i>load factor</i> is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
 * 负载因子是一个度量hash表在其容量自动增加之前可以达到完整程度。
 * 
 * The initial capacity and load factor parameters are merely hints to
 * the implementation.  
 * 初始容量和负载因子只是对实现的提示
 * The exact details as to when and whether the rehash
 * method is invoked are implementation-dependent.<p>
 *对于何时以及是否调用rehash扩容方法的确切细节取决于实现。
 * Generally, the default load factor (.75) offers a good tradeoff between
 * time and space costs. 
 * 通常情况下，负载因子默认为0.75，这是在时间和空间成本上的一个平衡。
 *  Higher values decrease the space overhead but
 * increase the time cost to look up an entry (which is reflected in most
 * <tt>Hashtable</tt> operations, including <tt>get</tt> and <tt>put</tt>).<p>
 *如果这个值太高虽然会降低空间的开销，但是检索的时候会增加时间成本，
 这在大多数哈希表的操作中，表现在get和put方法上。
 * The initial capacity controls a tradeoff between wasted space and the
 * need for <code>rehash</code> operations, which are time-consuming.
 * 初始容量是空间浪费和rehash操作的折衷，因为rehash非常耗时
 * No <code>rehash</code> operations will <i>ever</i> occur if the initial
 * capacity is greater than the maximum number of entries the <tt>Hashtable</tt> will contain divided    by its load factor.
 * 如果初始容量>hashtable将包含的最大条目和负载因子之商，那么不会发生任何rehash操作
 *   However,setting the initial capacity too high can waste space.<p>
 *然而，这会造成空间浪费
 * If many entries are to be made into a <code>Hashtable</code>,
 * creating it with a sufficiently large capacity may allow the
 * entries to be inserted more efficiently than letting it perform
 * automatic rehashing as needed to grow the table. <p>
如果许多条目被创建到hashtable中，创建一个足够大的容量允许这些条目插入，比让这些条目插入之后触发rehash操作更有效率。
 * This example creates a hashtable of numbers. It uses the names of
 * the numbers as keys:
 * 如下这个例子创建了一个数字的hash表，它使用名称做为数字的key。
 * <pre>   {@code
 *   Hashtable<String, Integer> numbers
 *     = new Hashtable<String, Integer>();
 *   numbers.put("one", 1);
 *   numbers.put("two", 2);
 *   numbers.put("three", 3);}</pre>
 *
 * <p>To retrieve a number, use the following code:  为了检索一个号码，用如下代码
 * <pre>   {@code
 *   Integer n = numbers.get("two");
 *   if (n != null) {
 *     System.out.println("two = " + n);
 *   }}</pre>
 *
 * <p>The iterators returned by the <tt>iterator</tt> method of the collections
 * returned by all of this class's "collection view methods" are
 * <em>fail-fast</em>:
 * 由此类所有"集合视图方法" 返回的集合的iterator方法返回的迭代器是fail-fast：
 *  if the Hashtable is structurally modified at any time
 * after the iterator is created, in any way except through the iterator's own
 * <tt>remove</tt> method, the iterator will throw a {@link
 * ConcurrentModificationException}.  
 * 如果在创建迭代器之后的任何时候对Hashtable进行结构修改，除非通过迭代器自己的remove方法，迭代器将抛出ConcurrentModificationException 。
 * 
 * Thus, in the face of concurrent
 * modification, the iterator fails quickly and cleanly, rather than risking
 * arbitrary, non-deterministic behavior at an undetermined time in the future.
因此，在并发修改的情况下，迭代器快速而干净地失败，而不是在未来的未确定时间冒任意，非确定性行为的风险。
 * The Enumerations returned by Hashtable's keys and elements methods are
 * <em>not</em> fail-fast.
 *  Hashtable的keys和elements方法返回的枚举不是快速失败的
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification. 
 * 请注意，迭代器的快速失败行为无法得到保证，因为一般来说，在存在不同步的并发修改时，不可能做出任何硬性保证。
 *  Fail-fast iterators
 * throw <tt>ConcurrentModificationException</tt> on a best-effort basis.
 * 失败快速迭代器以尽力而为的方式抛出ConcurrentModificationException 
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness: <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 * 因此，编写依赖于此异常的程序以确保其正确性是错误的： 迭代器的快速失败行为应该仅用于检测错误。

核心内部类Entry

Hashtable与Hashmap相似，Hashtable内部也是采用内部类来实现这些复杂的结构，主要是对键值对的封装
先看这个类的成员变量：

       final int hash;  //值的hashCode
        final K key;
        V value;
        Entry<K,V> next;    //表示下一个节点

再来看构造函数：

    protected Entry(int hash, K key, V value, Entry<K,V> next) {
            this.hash = hash;
            this.key =  key;
            this.value = value;
            this.next = next;
        }

比较核心的方法：

      public boolean equals(Object o) {
            if (!(o instanceof Map.Entry))    //先判断传入对象的类型
                return false;
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;

            return (key==null ? e.getKey()==null : key.equals(e.getKey())) &&
               (value==null ? e.getValue()==null : value.equals(e.getValue()));
        }

        public int hashCode() {
            return hash ^ Objects.hashCode(value);    //将hash和value的hashCode进行异或运算
            在HashMap中是key和value的hashCode进行异或运算，注意区分
            代码的结果实际上没有区别，但是hashMap中的hash属性已经被修改了，需要用高位部分混淆
            hashMap和HashTable中元素hash属性的计算是有区别的。

Entry全部代码：

private static class Entry<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Entry<K,V> next;

        protected Entry(int hash, K key, V value, Entry<K,V> next) {
            this.hash = hash;
            this.key =  key;
            this.value = value;
            this.next = next;
        }

        @SuppressWarnings("unchecked")
        protected Object clone() {
            return new Entry<>(hash, key, value,
                                  (next==null ? null : (Entry<K,V>) next.clone()));
        }

        // Map.Entry Ops

        public K getKey() {
            return key;
        }

        public V getValue() {
            return value;
        }

        public V setValue(V value) {
            if (value == null)
                throw new NullPointerException();

            V oldValue = this.value;
            this.value = value;
            return oldValue;
        }

        public boolean equals(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;

            return (key==null ? e.getKey()==null : key.equals(e.getKey())) &&
               (value==null ? e.getValue()==null : value.equals(e.getValue()));
        }

        public int hashCode() {
            return hash ^ Objects.hashCode(value);
        }

        public String toString() {
            return key.toString()+"="+value.toString();
        }
    }

成员变量

  /**
     * The hash table data.
     */
    private transient Entry<?,?>[] table;

    /**
     * The total number of entries in the hash table.
     */
    private transient int count;    //当前Hashtable的实际大小

    /**
     * The table is rehashed when its size exceeds this threshold.  (The
     * value of this field is (int)(capacity * loadFactor).)
     *
     * @serial
     */
    private int threshold;   //扩容阈值 = capacity * loadFactor

    /**
     * The load factor for the hashtable.
     *
     * @serial
     */
    private float loadFactor;       //负载因子，默认0.75

HashMap以及HashTable相关存储元素的数组等属性都是transient修饰，在序列化的时候不会被序列化，而是类自己实现了序列化和反序列化的方法。
不同的虚拟机或者不同操作系统上，hashcode的算法可能会造成相同的值经过hash之后其hashcode不一致。这样就容易造成实际上反序列化之后的HashTable与之前的HashTable不同。

静态变量

/**
     * The number of times this Hashtable has been structurally modified
     * Structural modifications are those that change the number of entries in
     * the Hashtable or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the Hashtable fail-fast.  (See ConcurrentModificationException).
     */
    private transient int modCount = 0;  

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = 1421746759512286392L;

private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

这里对为啥MAX_ARRAY_SIZE不是Integer.MAX_VALUE进行了解释。再某些jvm中，数组是需要一些头信息的，因此如果直接将int数组分配为Integer.MAX_VALUE，则会造成OOM。所以这里的长度是Integer.MAX_VALUE - 8;

HashTable的数据结构

在这里插入图片描述
我们学过了HashMap，现在来看HashTable就很简单了，内部构成是通过拉链法实现单向链表。

HashTable的构造函数

构造函数有四种：
1：无参构造

  /**
     * Constructs a new, empty hashtable with a default initial capacity (11)
     * and load factor (0.75).
     */
    public Hashtable() {
        this(11, 0.75f);
    }

2：有参（参数为初始容量,负载因子默认为0.75）

  public Hashtable(int initialCapacity) {
        this(initialCapacity, 0.75f);
    }

3：有参（参数是初始容量和负载因子）

  public Hashtable(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)        
            throw new IllegalArgumentException("Illegal Capacity: "+
                                               initialCapacity);
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal Load: "+loadFactor);

        if (initialCapacity==0)
            initialCapacity = 1;
        this.loadFactor = loadFactor;
        table = new Entry<?,?>[initialCapacity];
        threshold = (int)Math.min(initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1);
    }

第四种：有参（参数是集合）根据原有map产生一个新map构造方法

 public Hashtable(Map<? extends K, ? extends V> t) {
        this(Math.max(2*t.size(), 11), 0.75f);
        putAll(t);//putAll方法是一个同步方法，内部通过遍历，之后对put方法进行调用
    }

扩容原理

我们以put方法为引子，注意hashtable支持同步：

public synchronized V put(K key, V value) {
        // Make sure the value is not null
        if (value == null) {     //判断值是否为null
            throw new NullPointerException();
        }

        // Makes sure the key is not already in the hashtable.
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();
        //计算索引 
        int index = (hash & 0x7FFFFFFF) % tab.length;
       // 0x7FFFFFFF是Integer.Max_VALUE的值，这一步为了防止索引值超出整形范围，然后根据数组长度求余可以将索引控制在数组范围内
        @SuppressWarnings("unchecked")
        Entry<K,V> entry = (Entry<K,V>)tab[index];
        //下面的for循环是用来搜索HashTable是否包含当前要添加的对象，如果包含则直接更新value的值
        for(; entry != null ; entry = entry.next) {
            if ((entry.hash == hash) && entry.key.equals(key)) {
                V old = entry.value;
                entry.value = value;
                return old;
            }
        }
       
        addEntry(hash, key, value, index);//把要添加的对象添加到HashTable
        return null;
    }

下面进入addEntry方法：

   private void addEntry(int hash, K key, V value, int index) {
        modCount++;    //目的是让fail-fast可以监控到

        Entry<?,?> tab[] = table;
        if (count >= threshold) {            count元素的数量，如果超过阈值则扩容
            // Rehash the table if the threshold is exceeded
            rehash();                
 //扩容方法，该操作会重新创建一个更大的数组，并重新分配所有的数据
            tab = table;            
            hash = key.hashCode();
            //重新计算索引
            index = (hash & 0x7FFFFFFF) % tab.length;
        }

        // Creates the new entry.
        @SuppressWarnings("unchecked")
        Entry<K,V> e = (Entry<K,V>) tab[index];
        //在索引处创建元素，将这个元素添加到链表的末尾
        tab[index] = new Entry<>(hash, key, value, e);
        count++;     
    }

我们最后去看扩容方法resize：

  protected void rehash() {
        int oldCapacity = table.length;//原始的table容量
        Entry<?,?>[] oldMap = table;     //原始的table

        // overflow-conscious code
        int newCapacity = (oldCapacity << 1) + 1;   //扩容长度是原来两倍+1
        if (newCapacity - MAX_ARRAY_SIZE > 0) {     //如果新容量大于最大值
            if (oldCapacity == MAX_ARRAY_SIZE)   //如果旧容量大于最大值
                // Keep running with MAX_ARRAY_SIZE buckets
                return;
            newCapacity = MAX_ARRAY_SIZE;   反之将新容量设置到最大值
        }
        Entry<?,?>[] newMap = new Entry<?,?>[newCapacity];//创建新的数组

        modCount++;
        threshold = (int)Math.min(newCapacity * loadFactor, MAX_ARRAY_SIZE + 1);//重新计算阈值
        table = newMap;

        for (int i = oldCapacity ; i-- > 0 ;) {
            for (Entry<K,V> old = (Entry<K,V>)oldMap[i] ; old != null ; ) {
                Entry<K,V> e = old;
                old = old.next;
               //注意这个计算槽位的方法
                int index = (e.hash & 0x7FFFFFFF) % newCapacity;
                e.next = (Entry<K,V>)newMap[index];
                newMap[index] = e;
            }
        }
    }

1:因为采用的是&运算，这样一来任何大于0x7FFFFFFF都将只保留0x7FFFFFFF覆盖的低位部分。高位部分会舍去。
2:由于Hashtable的长度不一定满足2的幂，因此其计算槽位不能用位运算。
3:这里直接用%。显然其效率要低于Hashmap中的位运算操作。
4:由于在Hashtable中，rehash并没提供给外部访问，而调用rehash的位置只有addEntry方法。因此这个方法没有加同步关键字。
5:另外HashTable并没提供缩容机制，也不存在HashMap中红黑树和链表互相转换的问题。因此其逻辑要简单得多。

结束语

1：HashMap是不支持同步的，不能用于并发场景。而Hashtable的对外的方法都是synchronized的，这样HashTable就能在同步的情况下使用。
2：如果不需要考虑线程安全问题可以使用 HashMap 作为替代，如果需要线程安全的高并发可以使用 ConcurrentHashMap，一般不需要使用 HashTable，所以这个类了解就好= = 。