报错代码:
import gensim
wv_from_text = gensim.models.KeyedVectors.load_word2vec_format('C:/ChineseEmbeddingMin.txt', binary=False)
完整报错:
Traceback (most recent call last):
File "E:/tecentWordsTest.py", line 9, in <module>
wv_from_text = gensim.models.KeyedVectors.load_word2vec_format('C:/ChineseEmbeddingMin.txt', binary=False)
File "D:\python3.6\lib\site-packages\gensim\models\keyedvectors.py", line 1632, in load_word2vec_format
limit=limit, datatype=datatype, no_header=no_header,
File "D:\python3.6\lib\site-packages\gensim\models\keyedvectors.py", line 1913, in _load_word2vec_format
_word2vec_read_text(fin, kv, counts, vocab_size, vector_size, datatype, unicode_errors, encoding)
File "D:\python3.6\lib\site-packages\gensim\models\keyedvectors.py", line 1817, in _word2vec_read_text
raise EOFError("unexpected end of input; is count incorrect or file otherwise damaged?")
EOFError: unexpected end of input; is count incorrect or file otherwise damaged?
原因:
词向量文件中的第一行是两个数字,第一个表示此文件中共有词多少,不是行数(因为这个数和词的数量对不上才报了上边的错);第二个是词向量的维度
解决办法:
ctrl+End找到文件最底部,看看共有多少行,再减掉第一行,就是词的数量,把第一行的第一个数改成这个就好了