python字典缺失值的处理，五种方式：defaultdict,dict.get,Counter,missing

雷承霖

已于 2022-04-04 22:15:37 修改

阅读量2.1k

点赞数 3

分类专栏： python 文章标签： python

于 2022-04-04 22:12:52 首次发布

本文链接：https://blog.csdn.net/lclkkking/article/details/123960594

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本文介绍了Python中字典的高效计数技巧，包括 defaultdict、get 方法和 Counter 类，还演示了如何通过__missing__方法处理缺失值，以及在实际场景中的应用实例，如字母分类和数据加载自定义行为。

摘要由CSDN通过智能技术生成

在Python中字典是使用得非常频繁的一个数据结构，但是很多从其他语言转学python的朋友或是Python的初学者在使用字典的时候还是以原来语言的思维，没有体现出python语言的特性，下面我总结了一些常用场景可以让你的代码更加的"pythonnic"。

在代码中，我们经常会使用dict来计数，如下代码：

写法一：not in

arr = [1,3,2,3,2,1,2,3,2,1,2]
count = {}
for i in arr:
    if i not in count:
        count[i] = 1
    else:
        count[i]+=1
print(count)#{1: 3, 3: 3, 2: 5}

因为直接调用不存在的键会导致KeyError,所以我们需要先判断其键值是否存在，但其实有更好的办法，就是使用defaultdict：

写法二：defaultdict

from collections import defaultdict
arr = [1,3,2,3,2,1,2,3,2,1,2]
count = defaultdict(int)
for i in arr:
    count[i]+=1
print(count)#{1: 3, 3: 3, 2: 5})

也可以调用get方法

方法三：get

arr = [1,3,2,3,2,1,2,3,2,1,2]
count = {}
for i in arr:
    num = count.get(i,0)
    count[i]= num+1
print(count)#{1: 3, 3: 3, 2: 5}

不过，如果是计数的话，其实有一个专门的类用于计数

方法四：Counter

from collections import Counter
arr = [1,3,2,3,2,1,2,3,2,1,2]
count = Counter(arr)
print(count)#{2: 5, 1: 3, 3: 3}

其中方法二 defaultdict(type)参数不但可以是int，还可以是其他类型，例如set，这也是经常用到的，例如下面的程序，以第一个字母分类：

from collections import defaultdict

words = ['apple','dog','cat','person','hat','armour','ball']
data = defaultdict(set)

for word in words:
    data[word[0]].add(word)

print(data)# {'a': {'apple', 'armour'}, 'd': {'dog'}, 'c': {'cat'}, 'p': {'person'}, 'h': {'hat'}, 'b': {'ball'}}

好了，目前的这几种方法应该可以解决大多数缺失值的问题，但是如果当有缺失值的时候，需要进行一些复杂的工作呢，例如key保存的是数据集的名称，而value保存数据集的数据，当然我们可以在缺失的时候进行数据读取，但是有一个更好的方法是重写__missing__方法来进行缺失时的操作。

方法五：missing

class DataSet(dict):
    def loadData(self,file):
        pass
    def __missing__(self, key):
        value = self.loadData(key)
        self[key] = value
        return value

可能上面的例子比较抽象，下面用__missing__方法解决字母分类问题：

class DictTest(dict):
    def __missing__(self, key):
        value = set()
        self[key] = value
        return value
words = ['apple','dog','cat','person','hat','armour','ball']
dc = DictTest()
for word in words:
    dc[word[0]].add(word)
print(dc)#{'armour', 'apple'}, 'd': {'dog'}, 'c': {'cat'}, 'p': {'person'}, 'h': {'hat'}, 'b': {'ball'}