实验题目:统计词频 (选做) 给一段文本,例如:“who have an apple apple is free free is money you know”,请统计单词出现的次数。(提示:需要用正则表达式去掉标点符号和空格)
给一段文本,例如:“who have an apple apple is free free is money you know”,请统计单词出现的次数。(提示:需要用正则表达式去掉标点符号和空格))
源码
import re
reg = "[^0-9A-Za-z\u4e00-\u9fa5]"
word = "hello world!I'am cc.hello cc!"
#除去标点符号
x = re.sub(reg, ' ', word)
print(x)
#把单词提取出来,存到列表
list = []
str = ''
n=0
for i in x:
if i != ' ':
str = str+i
elif i == ' ':
list.append(str)
str = ''
else:
pass
存放到集合中,key为单词,value为个数
set={}
for i in list:
if i not in set.keys():
set[i] = list.count(i)
print(set)