基于Gensim创建词向量

最新推荐文章于 2024-05-25 14:31:47 发布

旧梦如烟

最新推荐文章于 2024-05-25 14:31:47 发布

阅读量1.8k

点赞数

分类专栏：人工智能

本文链接：https://blog.csdn.net/wang263334857/article/details/94632998

版权

首先得对数据进行预处理。

去掉停用词以及结巴分词。将处理后的结果保存成文件。

本案使用的是天龙八部.txt

import jieba
import jieba.analyse
import  jieba.posseg as  pseg



stop_words = []
with open ('data/stopwords.txt','r',encoding='UTF-8') as  f:
    for line  in   f.readlines():
        line = line.replace("\n","").replace("\r","").strip()
        stop_words.append(line)


print(stop_words)


cut_result =   open ("data/after_cut.txt",'w',encoding='UTF-8')

def cut_words(filepath):
    with open (filepath,'r',encoding='UTF-8')as  f:

        for line in f.readlines():
            word_list = []

            line = line.replace("\n", "").replace("\r", "").strip()
            words  = jieba.lcut(line)
            for   word in words:
                if(word not in  stop_words):

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

旧梦如烟

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
基于Gensim创建词向量

首先得对数据进行预处理。去掉停用词以及结巴分词。将处理后的结果保存成文件。本案使用的是天龙八部.txtimport jiebaimport jieba.analyseimport jieba.posseg as psegstop_words = []with open ('data/stopwords.txt','r',encoding='UTF-8') as...
复制链接

扫一扫