NLP : 不同类型的词向量介绍

Use Methods for Word Embedding
<1>Apply in current system, improve performance
<2>Directly analyze word vectors form the perspective of semantics, like Similarity and Semantic deviation (Semantic shift)

A LSTM frame based on word vectors can get good performance at seg, pos, ner; the accuracy is about the same to Probabilistic Graphical Model, and the size of model is quite smaller.

<1> Word2Vec
|1| Networks Frame
|2| CBOW : Continuous Bag-of-Words Model
|3| Skip-Gram : Continuous Skip-Gram Model

Train mothods : need big corpus(more than 5G), need segment
{1} Google C
{2} Python Gensim
https://radimrehurek.com/gensim/models/word2vec.html
https://zhuanlan.zhihu.com/p/24961011
{3} Java
https://github.com/NLPchina/Word2VEC_java

<2> GloVe
Reference Website:
https://nlp.stanford.edu/projects/glove/
https://github.com/stanfordnlp/GloVe

also 0.352 0.25323 -0.097659 0.26108 0.12976 0.33684 -0.73076 -0.42641 -0.22795 -0.083619 0.52963 0.34644 -0.32824 -0.28667 0.24876 0.22053 0.019356 -0.015447 -0.18319 -0.29729 0.11739 -0.071214 0.41086 0.013912 -0.17424 -1.5839 -0.051961 -0.18115 -0.76375 -0.17817 3.749 -0.045559 0.10721 -0.51313 0.25279 -0.051714 0.31911 0.28 -0.19937 0.17819 0.018623 0.47641 -0.15655 -0.38287 0.26989 -0.011186 -0.7244 0.036514 -0.011489 -0.025882
we 0.57387 -0.32729 0.070521 -0.4198 0.862 -0.80001 -0.40604 0.15312 -0.29788 -0.1105 -0.097119 0.59642 -0.99814 -0.28148 1.0152 0.87544 1.0282 -0.05036 0.24194 -1.1426 -0.50601 0.64976 0.74833 0.020473 0.9595 -1.9204 -0.80656 0.29247 1.0009 -0.98565 4.0094 1.0407 -0.82849 -0.4847 -0.36146 -0.39552 0.27891 0.15312 0.15848 0.018686 -0.50905 -0.22916 0.1868 0.44946 0.10229 0.21882 -0.30608 0.48759 -0.18439 0.69939
would 0.7619 -0.29773 0.51396 -0.13303 0.24156 0.066799 -0.54084 0.2071 -0.28225 -0.11638 0.21666 0.54908 -0.36744 -0.10543 0.81567 1.1743 0.56055 -0.3345 0.099767 -0.87465 0.12229 -0.18532 0.086783 -0.36343 0.008002 -2.2268 -0.20079 -0.10313 0.24318 -0.39819 3.7136 0.59088 -1.1013 -0.25292 0.0057067 -0.60475 0.35965 -0.059581 -0.029059 -0.3989 -0.52631 0.12436 0.13609 0.12699 -0.23032 -0.044567 -0.6545 0.43088 -0.22768 0.4026

only contain lowercase

<3>Diff
Reference Website:
https://arxiv.org/pdf/1411.5595.pdf
NIPS:http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization.pdf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值