.net framework 4.7.1中的Hashtable源码剖析

最新推荐文章于 2018-03-31 23:02:42 发布

num24

最新推荐文章于 2018-03-31 23:02:42 发布

阅读量502

点赞数

分类专栏： .NET

本文链接：https://blog.csdn.net/num24/article/details/79431845

版权

本文深入剖析了.NET Framework 4.7.1中的Hashtable，介绍了哈希的基本概念、双重散列法、源码中的成员变量、构造函数、插入数据的逻辑、查找操作步骤、删除策略以及重哈希和扩容机制。通过对哈希桶结构和冲突解决方法的详细解读，揭示了Hashtable高效运作的原理。

摘要由CSDN通过智能技术生成

基本概念

哈希的概念

哈希，是把输入值通过哈希算法产生固定长度的输出的函数。

哈希冲突的处理

哈希冲突解决很多，这里不做讨论等。源码采用的是双重散列法。下面是该算法的简单介绍。

双重散列法（DoubleHashing）
双重散列法是经典的数据表结构（T）。设 n 为存储在 T 中元素的数目，m为T的容量，则T的加载因子为

α= n / m， α：1 > α >0。

它是以关键字的另一个散列函数值作为增量。设两个哈希函数为：h_1 和 h_2，则得到的探测序列为：

h(i,k) = ( h_1(k) + i * h_2(k) ) % m，m为哈希表的容量，i： 1 < i < m - 1。

定义h_2的方法较多，但无采用什么方法都必须使h_2(k)的值和m互素（又称互质，表示两数的最大公约数为1，或者说是两数没有共同的因子，1除外）才能使发生冲突的同义词地址均匀地分布在整个哈希表中，否则可能造成同义词地址的循环计算。若m为素数，则h_2取1至m-1之间的任何数均与m互素。

源码使用的entry bucket slot的概念

entry是实实在在的，包含要插入的值的记录项。我就不做翻译了，注释中还是用entry。
bucket是一个结构体（下面会说到），存储key val 和 hash_coll三个值。我翻译为哈希桶
slot 是哈希表中一个位置的抽象。描述为这个slot是空的，还是被占有了。

源码使用的哈希函数

h1（key）=GetHash(key);// GetHash方法的实现是，如果没有自定义的GetHashCode方法，就调用Object对象的默认实现。
h2(key)=1 + ((h1（key）* HashPrime) % (hashsize - 1))// HashPrime值为101 
H(key)=h1(key) + i*h2(key, hashSize)// h2函数的值作为一个增量，i初值为0，每次冲突i+1，通过看插入entry的代码就可以知道

源码剖析

部分成员变量

        internal const Int32 HashPrime = 101;//初始使用的素数，初始化会用到
        private const Int32 InitialSize = 3;//哈希表初始大小，初始化会用到
        private struct bucket {
            public Object key;
            public Object val;
            public int hash_coll;   // Store hash code; sign bit means there was a collision.存储哈希码，第一位如果是1意味着冲突的产生
        }
	private bucket[] buckets;
        // The total number of entries in the hash table.
        private  int count;//哈希表的所有entry数
        
        // The total number of collision bits set in the hashtable
        private int occupancy;//哈希表中冲突位的总数
         private  int loadsize;//哈希表的最优entry数，超过则需要扩容
        private  float loadFactor;//负载因子
        
        private volatile int version;//版本，读取时会用到   
        private volatile bool isWriterInProgress;    //是否有写入者，读取时会用到

构造函数

代码实现中，从质数数组中选取了合适的值作为初始的hashsize，并以此作为数组长度实例化哈希桶数组。下面选取其中一个构造函数。

 //用指定的初始容量和负载因子构造一个新的hashtable。
//capacity参数表明hashtable包含的entry数量，在构造器中指定了这个参数能过滤(eliminate)一些扩容操作，否则当有元素加入hashtable时，这些扩容操作将会执行。
//loadfactor，负载因子参数，表明了hashtable entry在哈希桶（hash bucket）的最大比率。
//相对小的负载因子，拥有相对快的平均查找速度但是以增加内存消耗为代价。
//通常来说一个值为1.0的负载因子达到一个速度和大小的最好平衡。
        // Constructs a new hashtable with the given initial capacity and load
        // factor. The capacity argument serves as an indication of the
        // number of entries the hashtable will contain. When this number (or an
        // approximation) is known, specifying it in the constructor can eliminate
        // a number of resizing operations that would otherwise be performed when
        // elements are added to the hashtable. The loadFactor argument
        // indicates the maximum ratio of hashtable entries to hashtable buckets.
        // Smaller load factors cause faster average lookup times at the cost of
        // increased memory consumption. A load factor of 1.0 generally provides
        // the best balance between speed and size.
        // 
        public Hashtable(int capacity, float loadFactor) {
            if (capacity < 0)
                throw new ArgumentOutOfRangeException("capacity", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
            if (!(loadFactor >= 0.1f && loadFactor <= 1.0f))
                throw new ArgumentOutOfRangeException("loadFactor", Environment.GetResourceString("ArgumentOutOfRange_HashtableLoadFactor", .1, 1.0));
            Contract.EndContractBlock();
    
            // Based on perf work, .72 is the optimal load factor for this table.  0.72  是最优的load factor
            this.loadFactor = 0.72f * loadFactor;

            double rawsize = capacity / this.loadFactor;
            if (rawsize > Int32.MaxValue)//超过int32最大值
                throw new ArgumentException(Environment.GetResourceString("Arg_HTCapacityOverflow"));

            // Avoid awfully small sizes   避开糟糕的小容量的哈希表  在prime数组中找到合适的值。这里没有扩容操作，只是选取合适的值。
            int hashsize = (rawsize > InitialSize) ? HashHelpers.GetPrime((int)rawsize) : InitialSize;
            buckets = new bucket[hashsize];

            loadsize = (int)(this.loadFactor * hashsize);//哈希表长和负载因子的乘积，最优情况下的哈希表拥有的最多entry数
            isWriterInProgress = false;
            // Based on the current algorithm, loadsize must be less than hashsize.
            Contract.Assert( loadsize < hashsize, "Invalid hashtable loadsize!");
        }

插入数据

插入代码其实主要是一个while循环，如果没找到合适的桶就，重新计算哈希值，继续寻找，循环条件是查找次数小于哈希表长度。

插入分析: