存储加载模型
model = Word2Vec.load_word2vec_format('/./data/GoogleNews-vectors-negative300.txt', binary=False)
# using gzipped/bz2 input works too, no need to unzip:
model= Word2Vec.load_word2vec_format('./data/GoogleNews-vectors-negative300.bin', binary=True)
进一步训练
model = gensim.models.Word2Vec.load('/tmp/mymodel')
model.train(more_sentences)
【注意】对C生成的模型不能再进行训练.
获得对应词向量
model['computer'] # raw NumPy vector of a word
array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
单词相似度的计算
model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1)
[('queen', 0.50882536)]
model.doesnt_match("breakfast cereal dinner lunch".split())
'cereal'
model.similarity('woman', 'man')
.73723527
本文参考http://blog.csdn.net/Star_Bob/article/details/47808499