另谈GetHashCode函数



另谈GetHashCode函数

第一谈:




( Figure 1-1)


 


( Figure 1-1)所示,对于实现 hash算法的集合, HashSet<T>,假设会将 hash值对应的区域分为"32"个区域,集合在寻找对象的时候,首先,会根据自身的 hashcode % 32,所得的值去相对于的区域寻找对象.这显然提高了查询的效率.            当然,对于没有实现 hash算法的集合,实现GetHashCode()方法是没有意义的.

              话说回来,为什么在许多情况下,当我们重写了 Equals()方法时,编译器会提示我们同时也重写 GetHashCode()方法?

              试想,当我们去添加一个对象(此时我们只是重写了 Equals()方法,没有重写GetHashCode()方法),这时会有两种情况,其一是在"已有"和当前对象相同的区域寻找,此时,因为对象重复,无法添加(因为我们重写了Equals()方法);           其二,不在那个区域查找,也就是说,在两个不同的区域查找,此时可以再添加(因为在不同的区域查找.   

              所以说,很多时候,编译器会提示我们在重写Equals ()方法的时候,同时也重写GetHashCode()方法.从这里也可以看出,对于没有实现 hash算法的集合,重写GetHashCode()方法是没有意义的.(因为只有 hash算法才将其分域).


 


       classPoint{

       privateint _x; //横坐标.

       publicint X{
           get{return _x;}
           set{ _x= value;}
       }
       privateint _y; //纵坐标.

       publicint Y{
           get{return _y;}
           set{ _y= value;}
       }

       publicPoint(int x,int y){
           this._x= x;
           this._y= y;
       }

       //override theObject's Equals() Method.
       publicoverrideboolEquals(object obj){
           if(obj==null)thrownewNullReferenceException("Point");
           Point another = objasPoint;
           returnthis._x== another._x&&this._y== another._y;
       }

       //override theObject's GetHashCode() Method.
       publicoverrideintGetHashCode(){
           return X.GetHashCode()^ Y.GetHashCode();
       }
   }

       //ProgramMain方法中:

       class Program {

       static void Main(string[] args) {
           //HashSet(实现hash算法).
           HashSet<Point> points =newHashSet<Point>();

           Point p1 =newPoint(1,1);
           Point p2 =newPoint(2,2);
           Point p3 =newPoint(3,3);

           points.Add(p1);
           points.Add(p2);
           points.Add(p3);
           Console.WriteLine(points.Count);

           //添加重复值的Point.
           Point p4 =newPoint(2,2);
           points.Add(p4);

           Console.WriteLine(points.Count);
           //Point类未重写自己的 GetHashCode()方法事,output: 4.
           //Point类重写自己的 GetHashCode()方法后, output: 3.

           p1.X=0;  //修改参与计算hash值的字段.
           points.Remove(p1);
           //如果没有"修改参与计算hash值的字段",output 2;
           //否则 output: 3 (即无法删除).
           Console.WriteLine(points.Count);

           Console.ReadKey();
       }
   }

              如上测试,Main方法中,我们对一个对象(p1)存储到hash集合后,去修改参与hash计算的字段(我们在Point的重写 GetHashCode()方法涉及到 X字段),发现无法删除.

              注意,当一个对象存储到 hash集合后,就不能修改这个对象中参与计算的hash字段了;否则,对象修改后的hashcode与最初存储进hash集合中的hashcode就不同了.

              在这种情况下,即使在 Contains()方法使用该对象的当前引用作为参数区hash集合中检索对象也无法找到对象.这也会导致无法从hash集合中单独删除当前对象,从而造成内存泄露


 



第二谈:


要实现对象的相等比较,需要实现IEquatable<T>,或单独写一个类实现IEqualityComparer<T>接口。

像List<T>的Contains这样的函数,如果我们自己定义的对象不实现IEquatable<T>接口,这个函数会默认调用object的Equels来比较对象,得出非预期的结果。

先自定义一个类:

public   class   DaichoKey
{
     public   int   ID { get ; set ; }
     public   int   SubID { get ; set ; }
}
List<DaichoKey> lst = new   List<DaichoKey>() {
new   DaichoKey(){ID = 1,SubID =2},
new   DaichoKey(){ID = 1,SubID = 3}
};           
var   newItem = new   DaichoKey() { ID = 1, SubID = 2 };
bool   isContains = lst.Contains(newItem); //false

 上面的代码调用Contains后得到false,我们预想1和2的对象都已经存在了,应该得到true才对呀。

要实现这个效果,需要实现IEquatable<T>接口。

public   class   DaichoKey : IEquatable<DaichoKey>
{
     public   int   ID { get ; set ; }
     public   int   SubID { get ; set ; }
 
     public   bool   Equals(DaichoKey other)
     {
         return   this .ID == other.ID && this .SubID == other.SubID;
     }
}

经过上面的改良,结果如我们预期了,但是还不够完善,微软建议我们重写object的Equels方法我GetHashCode方法,以保持语义的一致性,于是有了下面的代码:

public   class   DaichoKey : IEquatable<DaichoKey>
{
     public   int   ID { get ; set ; }
     public   int   SubID { get ; set ; }
 
     public   bool   Equals(DaichoKey other)
     {
         return   this .ID == other.ID && this .SubID == other.SubID;
     }
     public   override   bool   Equals( object   obj)
     {
         if   (obj == null ) return   base .Equals(obj);
 
         if   (obj is   DaichoKey)
             return   Equals(obj as   DaichoKey);
         else
             throw   new   InvalidCastException( "the 'obj' Argument is not a DaichoKey object" );
     }
     public   override   int   GetHashCode()
     {
         return   base .GetHashCode(); //return object's hashcode
     }
}

 上面的代码依然还有缺陷,没重写==和!=运算符,但这不是本文讨论的重点。绕了一大圈,终于来到了GetHashCode函数身上,貌似他对我们的Contains函数没有啥影响呀,不重写又何妨?我们再来试试List<T>的一个扩展函数Distinct: 

List<DaichoKey> lst = new   List<DaichoKey>() {
new   DaichoKey(){ID = 1,SubID =2},
new   DaichoKey(){ID = 1,SubID = 3}
};
var   newItem = new   DaichoKey() { ID = 1, SubID = 2 };
lst.Add(newItem);
if   (lst != null )
{
     lst = lst.Distinct<DaichoKey>().ToList();
}
//result:
//1 2
//1 3
//1 2

 悲剧发生了,数据1,2的重复数据没有被去掉呀,我们不是实现了IEquatable<T>接口接口吗。在园子上找到了一篇文章(

c# 扩展方法奇思妙用基础篇八:Distinct 扩展),在回复中提到要将GetHashCode返回固定值,以强制调用IEquatable<T>的Equels方法。如下:

public   class   DaichoKey : IEquatable<DaichoKey>
{
     public   int   ID { get ; set ; }
     public   int   SubID { get ; set ; }
 
     public   bool   Equals(DaichoKey other)
     {
         return   this .ID == other.ID && this .SubID == other.SubID;
     }
     public   override   bool   Equals( object   obj)
     {
         if   (obj == null ) return   base .Equals(obj);
 
         if   (obj is   DaichoKey)
             return   Equals(obj as   DaichoKey);
         else
             throw   new   InvalidCastException( "the 'obj' Argument is not a DaichoKey object" );
     }
     public   override   int   GetHashCode()
     {
         return   0; //base.GetHashCode();
     }
}

 结果立马就对了,难道是这个Distinct函数在比较时,先比较的HashCode值?

带着这个疑问,反编译了下Distinct的代码,确实如我所猜测的那样。下面是源代码,有兴趣的同学,可以往下看看:

public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source)
{
    if (source == null) throw Error.ArgumentNull("source");
    return DistinctIterator<TSource>(source, null);
}
 
 private static IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
    <DistinctIterator>d__81<TSource> d__ = new <DistinctIterator>d__81<TSource>(-2);
    d__.<>3__source = source;
    d__.<>3__comparer = comparer;
    return d__;
}
 
 private sealed class <DistinctIterator>d__81<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
    // Fields
    private int <>1__state;
    private TSource <>2__current;
    public IEqualityComparer<TSource> <>3__comparer;
    public IEnumerable<TSource> <>3__source;
    public IEnumerator<TSource> <>7__wrap84;
    private int <>l__initialThreadId;
    public TSource <element>5__83;
    public Set<TSource> <set>5__82;
    public IEqualityComparer<TSource> comparer;
    public IEnumerable<TSource> source;
 
    // Methods
    [DebuggerHidden]
    public <DistinctIterator>d__81(int <>1__state);
    private void <>m__Finally85();
    private bool MoveNext();
    [DebuggerHidden]
    IEnumerator<TSource> IEnumerable<TSource>.GetEnumerator();
    [DebuggerHidden, TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    IEnumerator IEnumerable.GetEnumerator();
    [DebuggerHidden]
    void IEnumerator.Reset();
    void IDisposable.Dispose();
 
    // Properties
    TSource IEnumerator<TSource>.Current { [DebuggerHidden] get; }
    object IEnumerator.Current { [DebuggerHidden] get; }
}
 
private sealed class <DistinctIterator>d__81<TSource> : IEnumerable<TSource>, IEnumerable, IEnumerator<TSource>, IEnumerator, IDisposable
{
    // Fields
    private int <>1__state;
    private TSource <>2__current;
    public IEqualityComparer<TSource> <>3__comparer;
    public IEnumerable<TSource> <>3__source;
    public IEnumerator<TSource> <>7__wrap84;
    private int <>l__initialThreadId;
    public TSource <element>5__83;
    public Set<TSource> <set>5__82;
    public IEqualityComparer<TSource> comparer;
    public IEnumerable<TSource> source;
 
    // Methods
    [DebuggerHidden]
    public <DistinctIterator>d__81(int <>1__state);
    private void <>m__Finally85();
    private bool MoveNext();
    [DebuggerHidden]
    IEnumerator<TSource> IEnumerable<TSource>.GetEnumerator();
    [DebuggerHidden, TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    IEnumerator IEnumerable.GetEnumerator();
    [DebuggerHidden]
    void IEnumerator.Reset();
    void IDisposable.Dispose();
 
    // Properties
    TSource IEnumerator<TSource>.Current { [DebuggerHidden] get; }
    object IEnumerator.Current { [DebuggerHidden] get; }
}
 
private bool MoveNext()
{
    bool flag;
    try
    {
        switch (this.<>1__state)
        {
            case 0:
                this.<>1__state = -1;
                this.<set>5__82 = new Set<TSource>(this.comparer);
                this.<>7__wrap84 = this.source.GetEnumerator();
                this.<>1__state = 1;
                goto Label_0092;
 
            case 2:
                this.<>1__state = 1;
                goto Label_0092;
 
            default:
                goto Label_00A5;
        }
    Label_0050:
        this.<element>5__83 = this.<>7__wrap84.Current;
        if (this.<set>5__82.Add(this.<element>5__83))
        {
            this.<>2__current = this.<element>5__83;
            this.<>1__state = 2;
            return true;
        }
    Label_0092:
        if (this.<>7__wrap84.MoveNext()) goto Label_0050;
        this.<>m__Finally85();
    Label_00A5:
        flag = false;
    }
    fault
    {
        this.System.IDisposable.Dispose();
    }
    return flag;
}
 
internal class Set<TElement>
{
    // Fields
    private int[] buckets;
    private IEqualityComparer<TElement> comparer;
    private int count;
    private int freeList;
    private Slot<TElement>[] slots;
 
    // Methods
    [TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    public Set();
    public Set(IEqualityComparer<TElement> comparer);
    public bool Add(TElement value);
    [TargetedPatchingOptOut("Performance critical to inline this type of method across NGen image boundaries")]
    public bool Contains(TElement value);
    private bool Find(TElement value, bool add);
    internal int InternalGetHashCode(TElement value);
    public bool Remove(TElement value);
    private void Resize();
 
    // Nested Types
    [StructLayout(LayoutKind.Sequential)]
    internal struct Slot
    {
        internal int hashCode;
        internal TElement value;
        internal int next;
    }
}
public bool Add(TElement value)
{
    return !this.Find(value, true);
}
  
public bool Contains(TElement value)
{
    return this.Find(value, false);
}
 
private bool Find(TElement value, bool add)
{
    int hashCode = this.InternalGetHashCode(value);
    for (int i = this.buckets[hashCode % this.buckets.Length] - 1; i >= 0; i = this.slots[i].next)
    {
        if (this.slots[i].hashCode == hashCode && this.comparer.Equals(this.slots[i].value, value)) return true;//就是这一句了
    }
    if (add)
    {
        int freeList;
        if (this.freeList >= 0)
        {
            freeList = this.freeList;
            this.freeList = this.slots[freeList].next;
        }
        else
        {
            if (this.count == this.slots.Length) this.Resize();
            freeList = this.count;
            this.count++;
        }
        int index = hashCode % this.buckets.Length;
        this.slots[freeList].hashCode = hashCode;
        this.slots[freeList].value = value;
        this.slots[freeList].next = this.buckets[index] - 1;
        this.buckets[index] = freeList + 1;
    }
    return false;
}


 在这段代码中可以看出,扩展函数Distinct在内部使用了一个Set<T>的类来帮助踢掉重复数据,而这个内部类使用的是hash表的方式存储数据,所以会调用到我们自定义类的GetHashCode函数,如果返回的hashcode值不等,它就不会再调用Equels方法进行比较了。

原因已经一目了然了,得出的结论就是:

1,重写Equles方法的时候,尽量重写GetHashCode函数,并且不要简单的调用object的GetHashCode函数,返回一个设计合理的hash值,以保证结果如我们的预期。上面的做法直接返回了0,虽然解决了问题,但明显不是每个对象的hash值都是0,做法欠妥。

2,List<T>的Contains,IndexOf方法,不会用到GetHashCode函数。

3,扩展函数Distinct,Except用到了GetHashCode函数,必须重写这个函数。其他还有哪些函数用到了GetHashCode函数,以后再做补充,使用时多加注意就是了。

4,如果对象要作为字典类(Dictionary)的主键,必须重写GetHashCode函数。

主要参考:http://www.cnblogs.com/xiashengwang/archive/2013/03/04/2942555.html


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值