DL && NLP：Efficient Estimation ofWord Representations in Vector Space【Word2vec】notes

最新推荐文章于 2024-07-09 15:40:47 发布

大鹏IO

最新推荐文章于 2024-07-09 15:40:47 发布

阅读量186

点赞数

分类专栏： DL && NLP

本文链接：https://blog.csdn.net/qq_26552795/article/details/100567782

版权

DL && NLP 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目的：

为了更好的表示词向量，在更低的计算复杂度（网络中的参数）情况下在大规模数据集上有更高的准确率。

文章方法：

基于NNLM的网络结构，去掉了参数计算量最大的隐藏层，用Huffman树分层softmax替换了softmax输出层，从而将复杂度降低到了log n。

文章结果：

对比于之前NNLM学到的词向量，这个学习词向量的结构更加简单，准确率更高；同时发现了词向量之间的加减运算具有语义相关性。

Abstract

This paper firstly introduces the word2vec model in 2013. And this model now is applied in many NLP tasks. In a nutshell, word2vec can represent the similarity among words perfectly, and it can be trained in large data set, which can gain a better performance than the previous neural networks model.

In this paper, they propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task. In the experimental results, they observe large improvements in accuracy at much lower computational cost. Furthermore, these vectors provide state-of-the-art performance on their test set for measuring syntactic and semantic word similarities.

1.Introduction

Traditional NLP systems and techniques treat words as atomic units, there is no notion of similarity between words.

1.1 goals of the Paper

HThe main goal is to introduce techniques that can be used for learning high-quality word vectors from huge data sets with billions of words.

The similar words tend to be colse to each other, and they also have multiple degree of similarity.

It was found that similarity of word representations goes beyond simple synatic regularities.

2.model architectures

We define first the computational complexity of a model as the number of parameters that need to be aeecssed to fully train the model. Next, we try to maximize the accuracy and minimize the computational complexity.

In our model, we use hierarchical softmax where the vocabulary is represented as a Huffman binary tree.

3. New log-Linear Models (Log is due to Huffman tree)

The main observation from the previous section was that most of the complexity if caused by the non-linear hidden layer in the model.

The architecture of continuous bag of words model and continuous Skip-gram model are as follows,

4.Results

5. Examples of the Learned Relationships

6. Conclusion

In this paper, we studied the quality of vector representations of words derived by various models on a collection of syntactic and semantic language tasks.