python中文怎么通过jieba分词计算数量_Python3.6 利用jieba对中文文本进行分词，去停用词，统计词频...

最新推荐文章于 2022-09-27 19:35:57 发布

Z张N

最新推荐文章于 2022-09-27 19:35:57 发布

阅读量1k

点赞数

文章标签： python中文怎么通过jieba分词计算数量

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_36272673/article/details/113968874

版权

from collections import Counter

import jieba

# jieba.load_userdict('userdict.txt')

# 创建停用词list

def stopwordslist(filepath):

stopwords = [line.strip() for line in open(filepath, 'r').readlines()]

return stopwords

# 对句子进行分词

def seg_sentence(sentence):

sentence_seged = jieba.cut(sentence.strip())

stopwords = stopwordslist('stop_words.txt') # 这里加载停用词的路径

outstr = ''

for word in sentence_seged:

if word not in stopwords:

if word != '\t':

outstr += word

outstr += " "

return outstr

inputs = open('wordsbag2.txt', 'r') # 加载要处理的文件的路径

outputs = open('result2.txt', 'w') # 加载处理后的文件路径

for line in inputs:

line_seg = seg_sentence(line) # 这里的返回值是字符串

outputs.write(line_seg)

outputs.close()

inputs.close()

# WordCount

with open('result2.txt', 'r') as fr: # 读入已经去除停用词的文件

data = jieba.cut(fr.read())

data = dict(Counter(data))

with open('wordcount2.txt', 'w') as fw: # 读入存储wordcount的文件路径

for k, v in data.items():

fw.write('%s,%d\n' % (k, v))

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python中文怎么通过jieba分词计算数量_Python3.6 利用jieba对中文文本进行分词，去停用词，统计词频...

from collections import Counterimport jieba# jieba.load_userdict('userdict.txt')# 创建停用词listdef stopwordslist(filepath):stopwords = [line.strip() for line in open(filepath, 'r').readlines()]return stop...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。