准备工作
- 学习使用脚本计算词频,也就是每个词在文本中出现的次数。
操作方法
sentence = "Peter Piper picked a peck of pickled peppers A peck of pickled \
peppers Peter Piper picked If Peter Piper a peck of picked\
peppers Wheres the peck of pickled peppers Peter Piper Picked"
word_dict = {}
for word in sentence.split():
if word not in word_dict:
word_dict[word] = 1
else:
word_dict[word] += 1
for key, value in word_dict.items():
print("{}: {}".format(key, value))
#print(word_dict)
输出结果如下:
Peter: 114
Piper: 114
picked: 113
a: 112
peck: 114
of: 114
pickled: 113
peppers: 114
A: 111
If: 111
Wheres: 111
the: 111
Picked: 111
工作原理:
- 字典的内容是一个键值对,这里的键是词,对应的值是出现的词频。
- item() 函数可以遍历字典中的所有键值对。
更多内容:defaultdict
- python中的collection模块,有一个defaultdict类,它会传递一个函数作为参数,使用函数的返回值来初始化字典的未遇到的键.
- 可以看下面一个实例,它接受了num()函数的返回值作为默认初始值,这里的输出结果是上面的输出结果加上110。
- 另外,一般也可以使用python自带的int函数或者float函数。
from collections import defaultdict
def num():
return 110
sentence = "Peter Piper picked a peck of pickled peppers A peck of pickled \
peppers Peter Piper picked If Peter Piper a peck of picked\
peppers Wheres the peck of pickled peppers Peter Piper Picked"
word_dict = defaultdict(num)
for word in sentence.split():
word_dict[word] += 1
for key, value in word_dict.items():
print("{}: {}".format(key, value))
更多内容:Counter子类
- 还可以用COunter来计算词频,它是一个字典子类,得到的结果也是一个字典类型。
from collections import Counter
sentence = "Peter Piper picked a peck of pickled peppers A peck of pickled \
peppers Peter Piper picked If Peter Piper a peck of picked\
peppers Wheres the peck of pickled peppers Peter Piper Picked"
words = sentence.split()
word_count = Counter(words)
for key, value in word_dict.items():
print("{}: {}".format(key, value))
#print(word_count)