艺赛旗 RPA9.0全新首发免费下载 点击下载
http://www.i-search.com.cn/index.html?from=line1
python 词向量训练 以及聚类
#!/usr/bin/env Python3
author = ‘未昔/angelfate’
date = ‘2019/8/14 17:06’
-- coding: utf-8 --
import pandas as pd
import jieba,re,os
from gensim.models import word2vec
import logging
class Word2Vec_Test(object):
def init(self):
self.csv_path = ‘DouBanFilm_FanTanFengBao4.csv’
self.txt_path = ‘comment.txt’
```
首先提取 csv的 评论列内容,到txt
1、读取txt评论内容
def read_file(self):
"""
训练模型
:return:
"""
# jieba.load_userdict(self.txt_path)
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO,
filename='test_01.log')
filename = self.txt_path # 测试文本
pre, ext = os.path.splitext(filename) # 输入文件分开前缀,后缀 pre=test_01 ext=.txt
corpus = pre + '_seg' + ext
fin = open(filename, encoding='utf8').read().strip(' ').strip('\n').repl