实验题目:统计词频 (选做) 给一段文本,例如:“who have an apple apple is free free is money you know”,请统计单词出现的次数。(提示:需要用正则表达式去掉标点符号和空格)
import re
reg ="[^0-9A-Za-z\u4e00-\u9fa5]"
word ="hello world!I'am cc.hello cc!"#除去标点符号
x = re.sub(reg,' ', word)print(x)#把单词提取出来,存到列表list=[]str=''
n=0for i in x:if i !=' ':str=str+i
elif i ==' ':list.append(str)str=''else:pass
存放到集合中,key为单词,value为个数
set={}for i inlist:if i notinset.keys():set[i]=list.count(i)print(set)
————————————————
版权声明:本文为CSDN博主「常丨CHENG」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_45710713/article/details/121172385
给一段文本,例如:“who have an apple apple is free free is money you know”,请统计单词出现的次数。(提示:需要用正则表达式去掉标点符号和空格)
实验题目:统计词频 (选做) 给一段文本,例如:“who have an apple apple is free free is money you know”,请统计单词出现的次数。(提示:需要用正则表达式去掉标点符号和空格)import rereg = "[^0-9A-Za-z\u4e00-\u9fa5]"word = "hello world!I'am cc.hello cc!"#除去标点符号x = re.sub(reg, ' ', word)print(x)#把单词提取出来,存到列表