《Distributed Representations of Words and Phrases and their Compositionality》论文阅读笔记

最新推荐文章于 2022-09-15 22:40:35 发布

Mavis code

最新推荐文章于 2022-09-15 22:40:35 发布

阅读量243

点赞数

分类专栏：论文阅读笔记

本文链接：https://blog.csdn.net/pythonbanana/article/details/109595101

版权

论文阅读笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1.论文的贡献

对现有的Skip-gram model进行拓展，提高词向量的质量和学学习速度。

作者提出了创新方法：

通过对常用词的部分采样加快训练速度以及提高词向量的训练质量。
提出了negative sampling作为hierarchical softmax的替代方法。

2.前人的主要贡献

Mikolov 提出了Skip-gram model，这个模型能够快速地学习得到高质量的词向量，因为相较于传统使用神经网络学习词向量的模型，Skip-gram不涉及稠密矩阵的相乘。并且学习得到的词向量能够通过线性关系表达某些模式。We found that simple vector addition can often produce meaningful results. This compositionality suggests that a non-obvious degree of language understanding can be obtained by using basic mathematical operations on the word vector representations.(这种组合性表明，通过对单词向量表示的基本数学运算，可以获得非明显程度的语言理解。)
Recursive Autoencoders uses phrase vectors instead of the word vectors. 通过词向量的组合来表示句子的语义。
An alternative to the hierarchical softmax is Noise Contrastive Estimation (NCE), which was introduced by Gutmann and Hyvarinen and applied to language modeling by Mnih and Teh. NCE posits that a good model should be able to differentiate data from noise by means of logistic regression

3.现有方法未解决的问题

词向量无法很好表示某些短语和俚语。无法通过词向量推算得到某些短语。

4.预备知识

The Skip-gram model

模型的功能：The training objective of the Skip-gram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document.
在这里插入图片描述

Hierarchical Softmax

5.核心算法

5.1.Negative Sampling

在这里插入图片描述

Negative Sampling是作者针对具体问题对NCE简化得来的。

两者的不同：The main difference between the Negative sampling and NCE is that NCE needs both
samples and the numerical probabilities of the noise distribution, while Negative sampling uses only samples.

5.2.Subsampling of Frequent Words

思考出发点：Frequent words usually provide less information value than the rare words.

The vector representations of frequent words do not change significantly after training on several million examples.

作者提出的经验公式：
在这里插入图片描述

We chose this subsampling formula because it aggressively subsamples words whose frequency
is greater than t while preserving the ranking of the frequencies. Although this subsampling formula was chosen heuristically, we found it to work well in practice. It accelerates learning and even
significantly improves the accuracy. (经验公式的选择是启发式的)

6.实验

Omit…

7.对实验结果的解释

Additive Compositionality(语义合成性质)

在这里插入图片描述

8.结论

在这里插入图片描述

Mavis code

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《Distributed Representations of Words and Phrases and their Compositionality》论文阅读笔记

1.论文的贡献对现有的Skip-gram model进行拓展，提高词向量的质量和学学习速度。作者提出了创新方法：通过对常用词的部分采样加快训练速度以及提高词向量的训练质量。提出了negative sampling作为hierarchical softmax的替代方法。2.前人的主要贡献Mikolov 提出了Skip-gram model，这个模型能够快速地学习得到高质量的词向量，因为相较于传统使用神经网络学习词向量的模型，Skip-gram不涉及稠密矩阵的相乘。并且学习得到的词向量能够通过
复制链接

扫一扫

专栏目录