Python实现Word2vec学习笔记

最新推荐文章于 2024-07-03 10:24:51 发布

Hao973

最新推荐文章于 2024-07-03 10:24:51 发布

阅读量4.5k

点赞数 2

本文链接：https://blog.csdn.net/feng973/article/details/81067688

版权

本文是关于使用Python的gensim库实现Word2vec的详细学习笔记，涵盖了从语料预处理到模型训练及测试的全过程，包括中英文维基百科数据集的实验。附带GitHub代码资源。

摘要由CSDN通过智能技术生成

Python实现Word2vec学习笔记
参考：
中文word2vec的python实现
 python初步实现word2vec
中英文维基百科语料上的Word2Vec实验

GitHub代码地址

1 文件目录结构：

[.../vord2vec]$ls
data  model_train.py  word2vec_test.py  word_cut.py
[.../vord2vec]$ ls ./data/*
./data/倚天屠龙记.txt

2 word_cut.py文件内容：

#该文件实现了加载原始文件，进行切词服务
# coding: utf-8
import io
import sys
import jieba

file_name = './data/倚天屠龙记.txt'
cut_file = './data/倚天屠龙记_cut.txt'

reload(sys)
sys.setdefaultencoding('utf8')

# 此函数作用是对初始语料进行分词处理后，作为训练模型的语料
def cut_txt(old_file, cut_file):
    print 'cut_txt begin.'
    try:
        # read file context
        fi = io.open(old_file, 'r', encoding='utf-8')
        text = fi.read()  # 获取文本内容

        # cut word
        new_text = jieba.cut(text, cut_all=False)  # 精确模式
        str_out = ' '.join(new_text).replace('，', '').replace('。', '').replace('？', '').replace(

最低0.47元/天解锁文章

Hao973

关注

2
点赞
踩
27

收藏

觉得还不错? 一键收藏
1
评论
Python实现Word2vec学习笔记

Python实现Word2vec学习笔记参考：中文word2vec的python实现 python初步实现word2vec 中英文维基百科语料上的Word2Vec实验1 文件目录结构：[.../vord2vec]$lsdata model_train.py word2vec_test.py word_cut.py[.../vord2vec]$ ls ./data/...
复制链接

扫一扫