gensim训练词向量

最新推荐文章于 2021-07-20 15:53:31 发布

果子果实

最新推荐文章于 2021-07-20 15:53:31 发布

阅读量306

点赞数

分类专栏： NLP

本文链接：https://blog.csdn.net/k411797905/article/details/103299981

版权

NLP 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

https://blog.csdn.net/lilong117194/article/details/82849054
https://www.jiqizhixin.com/articles/2018-05-15-10
https://zhuanlan.zhihu.com/p/40016964

吴恩达视频里讲的是：
语料小，建议用CBOW,语料大，用skip-gram

word2vec训练参数：
min_count:最小出现次数
min_count is for pruning剪枝 the internal dictionary.
model = gensim.models.Word2Vec(sentences, min_count=10)
default value of min_count=5

size：词向量维度
Bigger size values require more training data, but can lead to better (more accurate) models. Reasonable values are in the tens to hundreds.

workers: default 3
for training parallelization, to speed up training:
The workers parameter only has an effect if you have Cython installed. Without Cython, you’ll only be able to use one core because of the GIL (and word2vec training will be miserably slow).

Memory:内存占用

Evaluating
model.accuracy(’./datasets/questions-words.txt’)
accuracy(questions, restrict_vocab=30000, most_similar=None, case_insensitive=True）

Online training / Resuming training

Training Loss Computation true/false
The parameter compute_loss can be used to toggle computation of loss while training the Word2Vec model. The computed loss is stored in the model attribute running_training_loss and can be retrieved using the function get_latest_training_loss as follows

细节见https://radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py

https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.accuracy
这个写的清

搜索即可

果子果实

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
gensim训练词向量

https://blog.csdn.net/lilong117194/article/details/82849054https://www.jiqizhixin.com/articles/2018-05-15-10https://zhuanlan.zhihu.com/p/40016964吴恩达视频里讲的是：语料小，建议用CBOW,语料大，用skip-gram...
复制链接

扫一扫