DL && NLP:Efficient Estimation ofWord Representations in Vector Space【Word2vec】notes

文章目的:

为了更好的表示词向量,在更低的计算复杂度(网络中的参数)情况下在大规模数据集上有更高的准确率。

文章方法:

基于NNLM的网络结构,去掉了参数计算量最大的隐藏层,用Huffman树分层softmax替换了softmax输出层,从而将复杂度降低到了log n。

文章结果:

对比于之前NNLM学到的词向量,这个学习词向量的结构更加简单,准确率更高;同时发现了词向量之间的加减运算具有语义相关性

Abstract

This paper firstly introduces the word2vec model in 2013. And this model now is applied in many NLP tasks. In a nutshell, word2vec can represent the similarity among words perfectly, and it can be trained in large data set, which can gain a better performance than the previous neural networks model.

In this paper, they propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task. In the experimental results, they observe large improvements in accuracy at much lower computational cost. Furthermore, these vectors provide state-of-the-art performance on their test set for measuring syntactic and semantic word similarities

1.Introduction

Traditional NLP systems and techniques treat words as atomic units, there is no notion of similarity between words. 

1.1 goals of the Paper

HThe main goal is to introduce techniques that can be used for learning high-quality word vectors from huge data sets with billions of words

The similar words tend to be colse to each other, and they also have multiple degree of similarity. 

It was found that similarity of word representations goes beyond simple synatic regularities. 

2.model architectures

We define first the computational complexity of a model as the number of parameters that need to be aeecssed to fully train the model. Next, we try to maximize the accuracy and minimize the computational complexity

In our model, we use hierarchical softmax where the vocabulary is represented as a Huffman binary tree

3. New log-Linear Models (Log is due to Huffman tree)

The main observation from the previous section was that  most of the complexity if caused by the non-linear hidden layer in the model. 

The architecture of continuous bag of words model and continuous Skip-gram model are as follows,

4.Results

5. Examples of the Learned Relationships

6. Conclusion

In this paper, we studied the quality of vector representations of words derived by various models on a collection of syntactic and semantic language tasks. 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值