python去除文本中的标点符号_去除特殊字符

.......................................................................................................................................................... 

 使用“stopwords.txt”中的符号集合可以帮助我们去除标点符号

.......................................................................................................................................................... 

.......................................................................................................................................................... 

里面也包含了特殊字符:

.......................................................................................................................................................... 

.......................................................................................................................................................... 

还包含了中文语气助词和没用的字共计777行:

.......................................................................................................................................................... 

.......................................................................................................................................................... 

具体使用方法:

.......................................................................................................................................................... 

stopwords = [line.strip() for line in open('stopwords.txt',encoding='utf-8').readlines()]
# print(stopwords)
f1=open('data2.txt','r',encoding='utf-8')
code=[]
for i in f1.read().split(' '):
    words = jieba.lcut(i)
    code+=words
d={}
for word in code:
    if word not in stopwords:
        d[word]=d.get(word,0)+1
ls=list(d.items())
ls.sort(key=lambda s:s[-1],reverse=True)
p=[]
for j in range(5):
    p.append(ls[j][0])
write=csv.writer(open("data1.csv",'w',encoding='utf-8'))
write.writerow(p)  #第一行

.......................................................................................................................................................... 

stopwords.txt下载地址:

.......................................................................................................................................................... 

https://pan.baidu.com/s/19KZpL6HU3hi4-XN3IXhuNg?pwd=hh33

.......................................................................................................................................................... .......................................................................................................................................................... Guff_hys_python数据结构,大数据开发学习,python实训项目-CSDN博客 

.......................................................................................................................................................... .......................................................................................................................................................... 

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Guff_hys

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值