python中读入文件jieba分词，使用字典和停用词，再将结果写入文件

青风learing

于 2019-04-26 12:10:59 发布

阅读量3.4k

点赞数 2

分类专栏：本科毕业论文代码文章标签： python 结巴文件读入写入文件字典

本文链接：https://blog.csdn.net/weixin_44301621/article/details/89542245

版权

具体代码如下

# -*- encoding=utf-8 -*-
import jieba.analyse
import jieba
import pandas as pd

# 载入自定义词典
jieba.load_userdict('dict.txt')
# 载入自定义停止词
jieba.analyse.set_stop_words('stop_words.txt')
# 去掉中英文状态下的逗号、句号
def clearSen(comment):
    comment = comment.strip()
    comment = comment.replace('、', '')
    comment = comment.replace('，', '。')
    comment = comment.replace('《', '。')
    comment = comment.replace('》', '。')
    comment = comment.replace('～', '')
    comment = comment.replace('…', '')
    comment = comment.replace('\r', '')
    comment = comment.replace('\t', ' ')
    comment = comment.replace('\f', ' ')
    comment = comment.replace('/', '')
    comment = comment.replace('、', ' ')
    comment = comment.re

最低0.47元/天解锁文章

青风learing

关注

2
点赞
踩
27

收藏

觉得还不错? 一键收藏
6
评论
python中读入文件jieba分词，使用字典和停用词，再将结果写入文件

具体代码如下# -*- encoding=utf-8 -*-import jieba.analyseimport jiebaimport pandas as pd# 载入自定义词典jieba.load_userdict('dict.txt')# 载入自定义停止词jieba.analyse.set_stop_words('stop_words.txt')# 去掉中英文状态下的逗号...
复制链接

扫一扫

专栏目录