..........................................................................................................................................................
使用“stopwords.txt”中的符号集合可以帮助我们去除标点符号
..........................................................................................................................................................

..........................................................................................................................................................
里面也包含了特殊字符:
..........................................................................................................................................................

..........................................................................................................................................................
还包含了中文语气助词和没用的字共计777行:
..........................................................................................................................................................

..........................................................................................................................................................
具体使用方法:
..........................................................................................................................................................
stopwords = [line.strip() for line in open('stopwords.txt',encoding='utf-8').readlines()]
# print(stopwords)
f1=open('data2.txt','r',encoding='utf-8')
code=[]
for i in f1.read().split(' '):
words = jieba.lcut(i)
code+=words
d={}
for word in code:
if word not in stopwords:
d[word]=d.get(word,0)+1
ls=list(d.items())
ls.sort(key=lambda s:s[-1],reverse=True)
p=[]
for j in range(5):
p.append(ls[j][0])
write=csv.writer(open("data1.csv",'w',encoding='utf-8'))
write.writerow(p) #第一行
..........................................................................................................................................................
stopwords.txt下载地址:
..........................................................................................................................................................
https://pan.baidu.com/s/19KZpL6HU3hi4-XN3IXhuNg?pwd=hh33
.......................................................................................................................................................... .......................................................................................................................................................... Guff_hys_python数据结构,大数据开发学习,python实训项目-CSDN博客
.......................................................................................................................................................... ..........................................................................................................................................................

3977

被折叠的 条评论
为什么被折叠?



