gensim训练词向量

https://blog.csdn.net/lilong117194/article/details/82849054
https://www.jiqizhixin.com/articles/2018-05-15-10
https://zhuanlan.zhihu.com/p/40016964

吴恩达视频里讲的是:
语料小,建议用CBOW,语料大,用skip-gram

word2vec训练参数:
min_count:最小出现次数
min_count is for pruning剪枝 the internal dictionary.
model = gensim.models.Word2Vec(sentences, min_count=10)
default value of min_count=5

size:词向量维度
Bigger size values require more training data, but can lead to better (more accurate) models. Reasonable values are in the tens to hundreds.

workers: default 3
for training parallelization, to speed up training:
The workers parameter only has an effect if you have Cython installed. Without Cython, you’ll only be able to use one core because of the GIL (and word2vec training will be miserably slow).

Memory:内存占用

Evaluating
model.accuracy(’./datasets/questions-words.txt’)
accuracy(questions, restrict_vocab=30000, most_similar=None, case_insensitive=True)

Online training / Resuming training

Training Loss Computation true/false
The parameter compute_loss can be used to toggle computation of loss while training the Word2Vec model. The computed loss is stored in the model attribute running_training_loss and can be retrieved using the function get_latest_training_loss as follows

细节见https://radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py

https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.accuracy
这个写的清

搜索即可

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值