《Distributed Representations of Words and Phrases and their Compositionality》论文阅读笔记

1.论文的贡献

对现有的Skip-gram model进行拓展,提高词向量的质量和学学习速度。

作者提出了创新方法:

  • 通过对常用词的部分采样加快训练速度以及提高词向量的训练质量。
  • 提出了negative sampling作为hierarchical softmax的替代方法。
2.前人的主要贡献
  • Mikolov 提出了Skip-gram model,这个模型能够快速地学习得到高质量的词向量,因为相较于传统使用神经网络学习词向量的模型,Skip-gram不涉及稠密矩阵的相乘。并且学习得到的词向量能够通过线性关系表达某些模式。We found that simple vector addition can often produce meaningful results. This compositionality suggests that a non-obvious degree of language understanding can be obtained by using basic mathematical operations on the word vector representations.(这种组合性表明,通过对单词向量表示的基本数学运算,可以获得非明显程度的语言理解。)
  • Recursive Autoencoders uses phrase vectors instead of the word vectors. 通过词向量的组合来表示句子的语义。
  • An alternative to the hierarchical softmax is Noise Contrastive Estimation (NCE), which was introduced by Gutmann and Hyvarinen and applied to language modeling by Mnih and Teh. NCE posits that a good model should be able to differentiate data from noise by means of logistic regression
3.现有方法未解决的问题
  • 词向量无法很好表示某些短语和俚语。无法通过词向量推算得到某些短语。
4.预备知识
The Skip-gram model

模型的功能:The training objective of the Skip-gram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document.
在这里插入图片描述

Hierarchical Softmax

5.核心算法
5.1.Negative Sampling

在这里插入图片描述

Negative Sampling是作者针对具体问题对NCE简化得来的。

两者的不同:The main difference between the Negative sampling and NCE is that NCE needs both
samples and the numerical probabilities of the noise distribution, while Negative sampling uses only samples.

5.2.Subsampling of Frequent Words

思考出发点:Frequent words usually provide less information value than the rare words.

The vector representations of frequent words do not change significantly after training on several million examples.

作者提出的经验公式:
在这里插入图片描述

We chose this subsampling formula because it aggressively subsamples words whose frequency
is greater than t while preserving the ranking of the frequencies. Although this subsampling formula was chosen heuristically, we found it to work well in practice. It accelerates learning and even
significantly improves the accuracy. (经验公式的选择是启发式的)

6.实验

Omit…

7.对实验结果的解释
Additive Compositionality(语义合成性质)

在这里插入图片描述

8.结论

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值