中文词向量集合
https://github.com/Embedding/Chinese-Word-Vectors
Word2vec / Skip-Gram with Negative Sampling (SGNS) | ||||
Corpus | Context Features | |||
Word | Word + Ngram | Word + Character | Word + Character + Ngram | |
Baidu Encyclopedia 百度百科 | 300d | 300d | 300d | 300d |
Wikipedia_zh 中文维基百科 | 300d | 300d | 300d | 300d |
People's Daily News 人民日报 | 300d | 300d | 300d | 300d |
Sogou News 搜狗新闻 | 300d | 300d | 300d | 300d |
Financial News 金融新闻 | 300d | 300d | 300d | 300d |
Zhihu_QA 知乎问答 | 300d | 300d | 300d | 300d |
Weibo 微博 | 300d | 300d | 300d | 300d |
Literature 文学作品 | 300d | 300d | 300d | 300d |
Complete Library in Four Sections 四库全书* | 300d | 300d | NAN | NAN |
Mixed-large 综合 Baidu Netdisk / Google Drive | 300d 300d | 300d 300d | 300d 300d | 300d 300d |
Positive Pointwise Mutual Information (PPMI) | ||||
Corpus | Context Features | |||
Word | Word + Ngram | Word + Character | Word + Character + Ngram | |
Baidu Encyclopedia 百度百科 | Sparse | Sparse | Sparse | Sparse |
Wikipedia_zh 中文维基百科 | Sparse | Sparse | Sparse | Sparse |
People's Daily News 人民日报 | Sparse | Sparse | Sparse | Sparse |
Sogou News 搜狗新闻 | Sparse | Sparse | Sparse | Sparse |
Financial News 金融新闻 | Sparse | Sparse | Sparse | Sparse |
Zhihu_QA 知乎问答 | Sparse | Sparse | Sparse | Sparse |
Weibo 微博 | Sparse | Sparse | Sparse | Sparse |
Literature 文学作品 | Sparse | Sparse | Sparse | Sparse |
Complete Library in Four Sections 四库全书* | Sparse | Sparse | NAN | NAN |
Mixed-large 综合 | Sparse | Sparse | Sparse | Sparse |
*Character embeddings are provided, since most of Hanzi are words in the archaic Chinese.