C# 基础之字典——Dictionary（一）

虫虫!

已于 2023-11-24 10:55:41 修改

阅读量4.4w

点赞数 36

分类专栏： C# 源码学习文章标签： c# 开发语言哈希算法

于 2022-07-30 00:03:33 首次发布

本文链接：https://blog.csdn.net/qq_41044598/article/details/126064451

版权

C# 同时被 2 个专栏收录

8 篇文章 5 订阅

订阅专栏

源码学习

6 篇文章 1 订阅

订阅专栏

文章目录

前言
1.字典的基本使用
2.字典底层实现
3.线程安全
总结

前言

字典是C#开发中经常使用的容器之一，面试中经常问到它的底层实现，可见其重要性，今天我们就来看一看Dictionary的源码，研究一下底层到底是怎么设计的。

1.字典的基本使用

static void Main(string[] args)
{
	// 1.定义
 	// Key和Value可以是任意类型
 	Dictionary<int, string> _testDic = new Dictionary<int, string>();

 	// 2.添加元素
	 _testDic.Add(24, "Canon");
 	// 注意相同相同Key值只能Add一次
 	 _testDic.Add(24, "Jason");// 报错：System.ArgumentException:“已添加了具有相同键的项。”
 	// 可以使用ContainsKey判断字典中是否已经存在
 	if (!_testDic.ContainsKey(24))  _testDic.Add(24, "Canon");

 	// 3.删除元素
 	// Remove 删除不存在的值不会报错
 	_testDic.Remove(24);

 	// 4.取值
 	// 索引器取值，若字典中没有Key会报错
 	string str = _testDic[24];
	// TryGetValue 取值成功返回true,内部对str赋值，否则返回false
 	bool isExist = _testDic.TryGetValue(24, out str);

 	// 5.改值
 	// 要确保字典中确实存在该值
 	if (_testDic.ContainsKey(1))  _testDic[1] = "";

 	// 6.遍历
 	// Key
 	foreach (var key in _testDic.Keys) Console.WriteLine("Key = {0}", key);
 	// Value
 	foreach (var value in _testDic.Values) Console.WriteLine("value = {0}", value);
 	// foreach遍历
 	foreach (var kvp in _testDic) Console.WriteLine("Key = {0}, Value = {1}", kvp.Key, kvp.Value);
 	// 迭代器遍历
 	var enumerator = _testDic.GetEnumerator();
 	while (enumerator.MoveNext())
 	{
 		var kvp = enumerator.Current;
 		Console.WriteLine("Key = {0}", kvp.Key);
 		Console.WriteLine("Key = {0}", kvp.Value);
 	}
            
 	// 7.清空
 	_testDic.Clear();
}

2.字典底层实现

链接: C# Dictionary源码

哈希函数

字典的Key——Value映射是利用哈希函数来建立的。
什么是哈希函数呢？
把一个对象转换成唯一且确定值的函数就叫做哈希函数，也叫做散列函数。
这个值就叫做哈希码（hashCode），在C#里一般是一个32位正整数。
就好比每一个人都对应一个身份证号码。

哈希桶

从源码中可以看到，字典内部使用数组存储数据。

private struct Entry {
 	public int hashCode;    // Lower 31 bits of hash code, -1 if unused
	public int next;        // Index of next entry, -1 if last
	public TKey key;        // Key of entry
	public TValue value;    // Value of entry
}

private int[] buckets;		// 哈希桶数组
private Entry[] entries;	// 数据实体数组
...

由于key的哈希值范围很大，我们不可能声明一个这么大的数组，让每个hashCode都对应唯一的索引。于是就有了哈希桶。
把hashCode进行分类装到一个个“桶(bucket)”里面，这样就减少了索引的范围。
每次拿到一个 hashCode 就去对应的哈希桶里面去找数据。

举个简单的例子，如果知道一个人的身份证号码，想要查找他的住址。
拿身份证号去全国的数据库里查找肯定会很慢，但我们知道，从身份证号可以看出这个人是属于哪个
省份或者地区的。去所在地区的数据库查找，那肯定就比较快了。

具体做法是声明一个buckets[ ]数组，通过 hashCode%BucketSize（取余）获得一个索引值 targetBucket，这样每一个 hashCode 都对应到一个buckets[targetBucket]。
而且 targetBucket 永远不会超出数组索引范围。

// & 0x7FFFFFFF 即 2进制011111...，是为了确保 hashCode 是一个正整数
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int targetBucket = hashCode % buckets.Length;

假如 buckets 数组的长度是3（事实上没有指定长度的话默认就是3，下一篇文章会讲到）。
那么 targetBucket 永远只会是0 ，1， 2这三个值。

哈希冲突

基于以上做法，不同的 hashCode 有可能对应到同一个哈希桶，这样就产生了哈希冲突。下图是哈希冲突示意图：
在这里插入图片描述
图中红色部分发生了哈希冲突，即多个 hashCode 对应到同一个桶索引。
注意哈希桶也是用数组来存储的，众所周知数组的一个索引位置只能存储一个值。
那么我们如何在一个哈希桶里找到想要的那个 hashCode 呢？
也就是如何去解决哈希冲突？
C#字典解决哈希冲突的方法是拉链法。

拉链法

将产⽣冲突的元素建⽴⼀个单链表，并将头指针地址存在对应桶的位置。这样每个“桶”只存了一个值。定位到Hash桶的位置后可通过遍历单链表的形式来查找元素。

当 bucket 没有指向任何 entry 时,它的值为-1。（buckets全部初始化为-1）

当我们往字典里插入一个数据时，先获取Key值的hashCode，通过hashCode找到桶的索引 targetBucket，buckets[targetBucket]存储了作为链表头节点的数据entry的索引。
通过这个索引我们就可以去 entries 数组中拿到我们想要的数据。

下面我们看链表构建过程（头插法）：

	...
	// 这里的index通常是上一次释放数据的位置或者entries数组的下一个空余位置
	entries[index].hashCode = hashCode;
    entries[index].next = buckets[targetBucket];
    entries[index].key = key;
    entries[index].value = value;
    buckets[targetBucket] = index;

1.插入时首先找到 entries 数组中的空余位置index，赋值 hahCode；
2.将 buckets[targetBucket] 的值赋值给 next 字段，头节点变成当前节点的下一个节点。(插入第一个数据时 next 为-1，表示此时链表没有头节点)；
3.赋值key、value；
4.更新 buckets[targetBucket] 的值为 index。插入的数据变成头节点。
此后每次插入的节点都变成新的头节点，并且 next 字段指向下一个节点的位置。

查找时则是通过同样的操作找到链表头节点的位置（哈希桶的第一个元素）。
通过遍历链表，比对 hashCode 和 key 值找到目标数据。

for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
	if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}

插入细节

在源码中发现插入和删除过程中维护了以下两个字段：

    private int freeList;
    private int freeCount;

通过这两个字段，插入时可以优先找到上一个释放数据的位置，这样可以很好的复用内存，避免了数组的频繁扩容，这里建议大家去看源码。

3.线程安全

从源码实现来看，在写入数据的时候并没有加锁。也就是说Dictionary是线程不安全的。
如果有多个线程同时操作一个字典，你需要加锁，就像下面的示例这样：

using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Runtime.Versioning;
using System.Threading;

namespace Extend
{
    /// <summary>
    /// A thread safe dictionary for internal use
    /// </summary>
    /// <typeparam name="K"></typeparam>
    /// <typeparam name="V"></typeparam>
    class ThreadSafeDictionary<K, V> : IDictionary<K, V>
    {
        Dictionary<K, V> dic = new Dictionary<K, V>();

        public Dictionary<K,V> InnerDictionary { get { return dic; } }
        public V this[K key]
        {
            get
            {
                return dic[key];
            }

            set
            {
               lock(dic)
                    dic[key] = value;
            }
        }

        public int Count
        {
            get
            {
                lock(dic)
                    return dic.Count;
            }
        }

        public bool IsReadOnly
        {
            get
            {
                lock(dic)
                    return IsReadOnly;
            }
        }

        public ICollection<K> Keys
        {
            get
            {
                throw new NotImplementedException();
            }
        }

        public ICollection<V> Values
        {
            get
            {
                throw new NotImplementedException();
            }
        }

        public void Add(KeyValuePair<K, V> item)
        {
            lock (dic)
                 dic.Add(item.Key, item.Value);
        }

        public void Add(K key, V value)
        {
            lock(dic)
                dic.Add(key, value);
        }

        public void Clear()
        {
            lock(dic)
                dic.Clear();
        }

        public bool Contains(KeyValuePair<K, V> item)
        {
            return dic.ContainsKey(item.Key);
        }

        public bool ContainsKey(K key)
        {
            return dic.ContainsKey(key);
        }

        public void CopyTo(KeyValuePair<K, V>[] array, int arrayIndex)
        {
            throw new NotImplementedException();
        }

        public IEnumerator<KeyValuePair<K, V>> GetEnumerator()
        {
            throw new NotImplementedException();
        }

        public bool Remove(KeyValuePair<K, V> item)
        {
            throw new NotImplementedException();
        }

        public bool Remove(K key)
        {
            lock(dic)
                return dic.Remove(key);
        }

        public bool TryGetValue(K key, out V value)
        {
            return dic.TryGetValue(key, out value);
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            throw new NotImplementedException();
        }
    }
}