Dictionary

最新推荐文章于 2022-12-05 15:02:58 发布

duanzhiyong666

最新推荐文章于 2022-12-05 15:02:58 发布

阅读量414

点赞数

分类专栏： C#

本文链接：https://blog.csdn.net/duanzhiyong666/article/details/89526100

版权

C# 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

字典 Dictionary

新项目中大面积的使用了字典，这块我不太熟悉，所以又开了新文章，记录下一些技巧吧。

说明

每一个元素都是键值对
键必须唯一，值不是
值和键都可以是任何类型
读取速度接近O(1)

增删改查

 Dictionary<int, string> testDic = new Dictionary<int, string>();
 testDic.Add(1, "Monday");
 testDic.Add(2, "Tuesday");
 testDic.Add(3, "Wedensday");
------------------------------------------------------------
testDic[1] = "test";
testDic.Remove(1);
-----------------------------------------------------------
var keys = testDic.Keys;
foreach(var k in keys)
{
    Console.WriteLine("Key :" + k);
}
var values = testDic.Values;
foreach(var v in values)
{
    Console.WriteLine("Value : " + v);
}
----------------------------------------------------------
foreach(KeyValuePair<string,string> keyvalue in testDic2)
{
    Console.WriteLine("key: " + keyvalue.Key + " value: " + keyvalue.Value);
}
if(testDic2.ContainsKey("waek1"))
{
    Console.WriteLine("ok");
}

Remove

移除：通过键值

    public bool Remove (TKey key); 移除成功返回 true ，否则false

底层实现原理

Entry结构体

private struct Entry {
    public int hashCode;    // 除符号位以外的31位hashCode值, 如果该Entry没有被使用，那么为-1
    public int next;        // 下一个元素的下标索引，如果没有下一个就为-1
    public TKey key;        // 存放元素的键
    public TValue value;    // 存放元素的值
}

其他关键私有变量

private int[] buckets;		// Hash桶
private Entry[] entries;	// Entry数组，存放元素
private int count;			// 当前entries的index位置
private int version;		// 当前版本，防止迭代过程中集合被更改
private int freeList;		// 被删除Entry在entries中的下标index，这个位置是空闲的
private int freeCount;		// 有多少个被删除的Entry，有多少个空闲的位置
private IEqualityComparer<TKey> comparer;	// 比较器
private KeyCollection keys;		// 存放Key的集合
private ValueCollection values;		// 存放Value的集合

重点是buckets和entries两个数组，这是实现Dictionary的关键

Add操作

假设一个buckets和entries都为4
在这里插入图片描述
执行Add(“a”,“b”)
1、根据key的值，计算hash值，假设为6 ，GetHashCode(“a”) = 6；
2、通过对HashCode进行取余计算，计算出HashCode应该放到哪个桶中，现在有4个桶（buckets.Length = 4），则 6%4 = 2 ，放到index为2的桶中，也就是buckets[2]；
3、避开冲突这种情况下不谈（元素被删除时），接下来会将HashCode、Key、Value等信息存入Entries[count]中，count是连续的，count++会指向下一个，上图中是count = 0，信息存入Entries[count]中；
4、将Entries下标 EntriesIndex赋值给buckets中，图中是buckets[2] = 0;
5、最后version++，集合发生了变化，版本+1，当集合发生增、删、改都会引起版本变化

变化效果如下图
在这里插入图片描述
实际上经常会发生哈希碰撞。
现在添加一个Add(“c”,“d”)，c的HashCode也为6，则存放在buckets[2]，前面1-3步骤是一样的

如果继续执行步骤4，buckets[2] = 1，那么原来的buckets[2] -> entries[0]的关系就会丢失。则需要Entry中的next来控制。
过程是：将新的entry.next指向之前的元素，buckets[index] 指向现在的新元素，构成一个链表

entries[index].next = buckets[targetBucket];
...
buckets[targetBucket] = index;

在这里插入图片描述

Find操作

再添加一个新的键值：Add(“e”,“f”) getHashCode(“e”) = 7 , 7 % buckets.Length = 3 ，结构图如下：
在这里插入图片描述
当我们执行：dictionary.GetValueOrDefault(“a”) ，会执行步骤：
1、获取key的HashCode，计算桶在的位置。因为a的HashCode为6，桶则在2，buckets[2]
2、通过buckets[2] = 1，找到entries[1]，比较key值是否相等，不相等则通过entries[next]继续寻找，找到或者next = -1时，这里会找到返回entryIndex = 0；
3、如果entryIndex >= 0 ,那么返回 entries[entryIndex]的值，否则返回default(TValue)，这里我们返回entries[0].value
在这里插入图片描述

------查找过程代码如下：
// 寻找Entry元素的位置
private int FindEntry(TKey key) {
    if( key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    if (buckets != null) {
        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF; // 获取HashCode，忽略符号位
        // int i = buckets[hashCode % buckets.Length] 找到对应桶，然后获取entry在entries中位置
        // i >= 0; i = entries[i].next 遍历单链表
        for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
            // 找到就返回了
            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
        }
    }
    return -1;
}
...
internal TValue GetValueOrDefault(TKey key) {
    int i = FindEntry(key);
    // 大于等于0代表找到了元素位置，直接返回value
    // 否则返回该类型的默认值
    if (i >= 0) {
        return entries[i].value;
    }
    return default(TValue);
}

Remove操作

public bool Remove(TKey key) {
    if(key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    if (buckets != null) {
        // 1. 通过key获取hashCode
        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
        // 2. 取余获取bucket位置
        int bucket = hashCode % buckets.Length;
        // last用于确定是否当前bucket的单链表中最后一个元素
        int last = -1;
        // 3. 遍历bucket对应的单链表
        for (int i = buckets[bucket]; i >= 0; last = i, i = entries[i].next) {
            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
                // 4. 找到元素后，如果last< 0，代表当前是bucket中最后一个元素，那么直接让bucket内下标赋值为 entries[i].next即可
                if (last < 0) {
                    buckets[bucket] = entries[i].next;
                }
                else {
                    // 4.1 last不小于0，代表当前元素处于bucket单链表中间位置，需要将该元素的头结点和尾节点相连起来,防止链表中断
                    entries[last].next = entries[i].next;
                }
                // 5. 将Entry结构体内数据初始化
                entries[i].hashCode = -1;
                // 5.1 建立freeList单链表
                entries[i].next = freeList;
                entries[i].key = default(TKey);
                entries[i].value = default(TValue);
                // *6. 关键的代码，freeList等于当前的entry位置，下一次Add元素会优先Add到该位置
                freeList = i;
                freeCount++;
                // 7. 版本号+1
                version++;
                return true;
            }
        }
    }
    return false;
}

注意执行完毕后version、freeList、freeCount都更新了
在这里插入图片描述

Resize扩容

当buckets或者entries满了时，就会触发扩容操作，或者到达HashCollisionThreshold的值也会触发扩容。
数组满的情况下扩容：
在这里插入图片描述
都在buckets[3]中，进行了很多次的哈希碰撞，时间复杂度变为了On，这个碰撞次数有一个阈值：
HashCollisionThreshold = 100，达到阈值时也会进行扩容

操作如何进行：
现假设Dictionary大小为2，碰撞阈值也为2

1、申请两倍大小的buckets、entries，将现有元素拷贝到新的字典中
在这里插入图片描述
2、如果是Hash碰撞扩容，则会使用新的哈希函数进行计算Hash值。重新计算不一定能够解决都在一个桶中的问题：

4、对entries每个元素bucket = newEntries[i].hashCode % newSize确定新buckets位置

5、重建hash链，newEntries[i].next=buckets[bucket]; buckets[bucket]=i; **
因为buckets也扩充为两倍大小了，所以需要重新确定hashCode在哪个bucket中；最后重新建立hash单链表.
在这里插入图片描述
jdk中如果哈希碰撞过多会有单链表->红黑树的转换。.Net中没有，每次扩容都会重新遍历所有元素，最好在初始化时预估一个大小

private void Resize(int newSize, bool forceNewHashCodes) {
    Contract.Assert(newSize >= entries.Length);
    // 1. 申请新的Buckets和entries
    int[] newBuckets = new int[newSize];
    for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
    Entry[] newEntries = new Entry[newSize];
    // 2. 将entries内元素拷贝到新的entries总
    Array.Copy(entries, 0, newEntries, 0, count);
    // 3. 如果是Hash碰撞扩容，使用新HashCode函数重新计算Hash值
    if(forceNewHashCodes) {
        for (int i = 0; i < count; i++) {
            if(newEntries[i].hashCode != -1) {
                newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
            }
        }
    }
    // 4. 确定新的bucket位置
    // 5. 重建Hahs单链表
    for (int i = 0; i < count; i++) {
        if (newEntries[i].hashCode >= 0) {
            int bucket = newEntries[i].hashCode % newSize;
            newEntries[i].next = newBuckets[bucket];
            newBuckets[bucket] = i;
        }
    }
    buckets = newBuckets;
    entries = newEntries;
}

再谈Add操作

count是通过count++的方式来指向下个空闲的entry，如果有元素被删除，那么count之前的位置就会有空闲的entry，则需要进行处理：
就有了Remove中会进行freeList，freeCount赋值

private void Insert(TKey key, TValue value, bool add){
   
   if( key == null ) {
       ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
   }

   if (buckets == null) Initialize(0);
   // 通过key获取hashCode
   int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
   // 计算出目标bucket下标
   int targetBucket = hashCode % buckets.Length;
   // 碰撞次数
   int collisionCount = 0;
   for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {
       if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
           // 如果是增加操作，遍历到了相同的元素，那么抛出异常
           if (add) {      
   			ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
           }
           // 如果不是增加操作，那可能是索引赋值操作 dictionary["foo"] = "foo"
           // 那么赋值后版本++，退出
           entries[i].value = value;
           version++;
           return;
       }
       // 每遍历一个元素，都是一次碰撞
       collisionCount++;
   }
   int index;
   // 如果有被删除的元素，那么将元素放到被删除元素的空闲位置
   if (freeCount > 0) {
       index = freeList;
       freeList = entries[index].next;
       freeCount--;
   }
   else {
       // 如果当前entries已满，那么触发扩容
       if (count == entries.Length)
       {
           Resize();
           targetBucket = hashCode % buckets.Length;
       }
       index = count;
       count++;
   }

   // 给entry赋值
   entries[index].hashCode = hashCode;
   entries[index].next = buckets[targetBucket];
   entries[index].key = key;
   entries[index].value = value;
   buckets[targetBucket] = index;
   // 版本号++
   version++;

   // 如果碰撞次数大于设置的最大碰撞次数，那么触发Hash碰撞扩容
   if(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer)) 
   {
       comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer);
       Resize(entries.Length, true);
   }
}

version

版本控制则是用来在遍历的时候进行判断，若在遍历时version变化（进行增删改等操作）则抛出异常
在这里插入图片描述

duanzhiyong666

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Dictionary

字典 Dictionary新项目中大面积的使用了字典，这块我不太熟悉，所以又开了新文章，记录下一些技巧吧。Remove移除：通过键值public bool Remove (TKey key); 移除成功返回 true ，否则false...
复制链接

扫一扫