python怎么使用自定义停用词_【python】jieba分词,去停用词,自定义字典

使用jieba分词,去停用词,添加自定义字典。

#encoding=utf-8

import jieba

filename = "gp.txt"

stopwords_file = "stopwords.txt"

jieba.load_userdict("dict.txt")

stop_f = open(stopwords_file,"r",encoding='utf-8')

stop_words = list()

for line in stop_f.readlines():

line = line.strip()

if not len(line):

continue

stop_words.append(line)

stop_f.close

#print(len(stop_words))

f = open(filename,"r",encoding='utf-8')

result = list()

for line in f.readlines():

line = line.strip()

if not len(line):

continue

outstr = ''

seg_list = jieba.cut(line,cut_all=False)

for word in seg_list:

if word not in stop_words:

if word != '\t':

outstr += word

outstr += " "

# seg_list = " ".join(seg_list)

result.append(outstr.strip())

f.close

with open("gp2.txt","w",encoding='utf-8') as fw:

for sentence in result:

sentence.encode('utf-8')

data=sentence.strip()

if len(data)!=0:

fw.write(data)

fw.write("\n")

print ("end")

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值