Distinct删除重复数据时自定义的方法比较【转】

最新推荐文章于 2022-11-11 11:22:18 发布

weixin_30895603

最新推荐文章于 2022-11-11 11:22:18 发布

阅读量75

点赞数

原文链接：http://www.cnblogs.com/yougmi/p/4562129.html

版权

最近项目中在用Linq Distinct想要将重复的资料去除时，发现它跟Any之类的方法有点不太一样，不能很直觉的在呼叫时直接带入重复数据判断的处理逻辑，所以当我们要用某个成员属性做重复数据的判断时，就必需绕一下路，这边稍微将处理的方法做个整理并记录一下。

首先为了方便接下去说明，我们必须先来准备后面会用到的数据类别，这边一样用笔者最常用来示范的Person类别，内含两个成员属性ID与Name。

 
    01. 
    public struct Person 
   
    02. 
    { 
   
    03. 
    #region Property 
   
    04. 
    /// <summary> 
   
    05. 
    /// Gets or sets the ID. 
   
    06. 
    /// </summary> 
   
    07. 
    /// <value>The ID.</value> 
   
    08. 
    public string ID { get; set; } 
   
    09. 
      
    10. 
    /// <summary> 
   
    11. 
    /// Gets or sets the name. 
   
    12. 
    /// </summary> 
   
    13. 
    /// <value>The name.</value> 
   
    14. 
    public string Name { get; set; } 
   
    15. 
    #endregion 
   
    16. 
      
    17. 
      
    18. 
    #region Public Method 
   
    19. 
    /// <summary> 
   
    20. 
    /// Returns a <see cref="System.String"/> that represents this instance. 
   
    21. 
    /// </summary> 
   
    22. 
    /// <returns> 
   
    23. 
    /// A <see cref="System.String"/> that represents this instance. 
   
    24. 
    /// </returns> 
   
    25. 
    public override string ToString() 
   
    26. 
    { 
   
    27. 
    return Name; 
   
    28. 
    } 
   
    29. 
    #endregion

接着准备要用来测试的资料，这边准备了十一个Person对象，前十个对象的名称都是Larry，第十一个对象的名称为LastLarry。期望后面可以透过Distinct将重复的Larry过滤掉。
...

 
    1. 
    var datas = new List<Person>(); 
   
    2. 
    int idx = 0; 
   
    3. 
    for (idx = 0; idx < 10; ++idx) 
   
    4. 
    { 
   
    5. 
    datas.Add(new Person() {ID = idx.ToString(), Name = "Larry" }); 
   
    6. 
    } 
   
    7. 
    datas.Add(new Person() { ID = idx.ToString(), Name = "LastLarry" }); 
   
    8. 
    ...

若是我们想直接用内建的Distinct函式来过滤数据。
...

 
    01. 
    var distinctDatas = datas.Distinct(); 
   
    02. 
    ShowDatas(distinctDatas); 
   
    03. 
    ... 
   
    04. 
    private static void ShowDatas<T>(IEnumerable<T> datas) 
   
    05. 
    { 
   
    06. 
    foreach (var data in datas) 
   
    07. 
    { 
   
    08. 
    Console.WriteLine(data.ToString()); 
   
    09. 
    } 
   
    10. 
    }

可以看到运行起来并不如我们所预期的，过滤出来的数据跟没过滤一样。

为了解决这个问题，我们必须要做个可依照Person.Name去做比较的Compare类别，该Compare类别必须实做IEqualityCompare.Equals与IEqualityCompare.GetHashCode方法，并在呼叫Distinct过滤时将该Compare对象带入。

 
    01. 
    distinctDatas = datas.Distinct(new PersonCompare()); 
   
    02. 
    ShowDatas(distinctDatas); 
   
    03. 
    ... 
   
    04. 
    class PersonCompare : IEqualityComparer<Person> 
   
    05. 
    { 
   
    06. 
    #region IEqualityComparer<Person> Members 
   
    07. 
      
    08. 
    public bool Equals(Person x, Person y) 
   
    09. 
    { 
   
    10. 
    return x.Name.Equals(y.Name); 
   
    11. 
    } 
   
    12. 
      
    13. 
    public int GetHashCode(Person obj) 
   
    14. 
    { 
   
    15. 
    return obj.Name.GetHashCode(); 
   
    16. 
    } 
   
    17. 
      
    18. 
    #endregion 
   
    19. 
    }

运行起来就会是我们所期望的样子。

www.it165.net

但是这样做代表我们每次碰到新的类别就必须要实现对应的Compare类别，用起来十分的不便。因此有人就提出用泛型加上反射的方式做一个共享的Compare类别。

 
    01. 
    public class PropertyComparer<T> : IEqualityComparer<T> 
   
    02. 
    { 
   
    03. 
    private PropertyInfo _PropertyInfo; 
   
    04. 
      
    05. 
    /// <summary> 
   
    06. 
    /// Creates a new instance of PropertyComparer. 
   
    07. 
    /// </summary> 
   
    08. 
    /// <param name="propertyName">The name of the property on type T 
   
    09. 
    /// to perform the comparison on.</param> 
   
    10. 
    public PropertyComparer(string propertyName) 
   
    11. 
    { 
   
    12. 
    //store a reference to the property info object for use during the comparison 
   
    13. 
    _PropertyInfo = typeof(T).GetProperty(propertyName, 
   
    14. 
    BindingFlags.GetProperty | BindingFlags.Instance | BindingFlags.Public); 
   
    15. 
    if (_PropertyInfo == null) 
   
    16. 
    { 
   
    17. 
    throw new ArgumentException(string.Format("{0} is not a property of type {1}.", propertyName, typeof(T))); 
   
    18. 
    } 
   
    19. 
    } 
   
    20. 
      
    21. 
    #region IEqualityComparer<T> Members 
   
    22. 
      
    23. 
    public bool Equals(T x, T y) 
   
    24. 
    { 
   
    25. 
    //get the current value of the comparison property of x and of y 
   
    26. 
    object xValue = _PropertyInfo.GetValue(x, null); 
   
    27. 
    object yValue = _PropertyInfo.GetValue(y, null); 
   
    28. 
      
    29. 
    //if the xValue is null then we consider them equal if and only if yValue is null 
   
    30. 
    if (xValue == null) 
   
    31. 
    return yValue == null; 
   
    32. 
      
    33. 
    //use the default comparer for whatever type the comparison property is. 
   
    34. 
    return xValue.Equals(yValue); 
   
    35. 
    } 
   
    36. 
      
    37. 
    public int GetHashCode(T obj) 
   
    38. 
    { 
   
    39. 
    //get the value of the comparison property out of obj 
   
    40. 
    object propertyValue = _PropertyInfo.GetValue(obj, null); 
   
    41. 
      
    42. 
    if (propertyValue == null) 
   
    43. 
    return 0; 
   
    44. 
      
    45. 
    else 
   
    46. 
    return propertyValue.GetHashCode(); 
   
    47. 
    } 
   
    48. 
      
    49. 
    #endregion 
   
    50. 
    }

使用时只要带入泛型的型态与成原属性的名称，就可以产生出需要的Compare对象。

1. distinctDatas = datas.Distinct(new PropertyComparer<Person>("Name"));

2. ShowDatas(distinctDatas);

这样的作法是减少了许多额外的负担，但是感觉还是少了一条路，用起来也还是必须要建立Compare对象，而且反射也存在着效能的问题，如果每个元素都透过这个Compare去做判断，感觉处理上也不是很漂亮。所以有人也意识到了这个问题，用扩充方法提供了一条我们比较熟悉的路，可以直接将Lambda带入以决定元素要怎样过滤。

 
    01. 
    public static class EnumerableExtender 
   
    02. 
    { 
   
    03. 
    public static IEnumerable<TSource> Distinct<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) 
   
    04. 
    { 
   
    05. 
    HashSet<TKey> seenKeys = new HashSet<TKey>(); 
   
    06. 
    foreach (TSource element in source) 
   
    07. 
    { 
   
    08. 
    var elementValue = keySelector(element); 
   
    09. 
    if (seenKeys.Add(elementValue)) 
   
    10. 
    { 
   
    11. 
    yield return element; 
   
    12. 
    } 
   
    13. 
    } 
   
    14. 
    } 
   
    15. 
    }

使用上会好写许多。

1. distinctDatas = datas.Distinct(person => person.Name);

2. ShowDatas(distinctDatas);

若是不想加入额外的类别，我们也可以透过Group方式来达到类似的效果。

 
    1. 
    distinctDatas = from data in datas 
   
    2. 
    group data by data.Name into g 
   
    3. 
    select g.First(); 
   
    4. 
    ShowDatas(distinctDatas);