干货: Skip-gram 详细推导加分析

最新推荐文章于 2023-05-12 17:58:27 发布

Jay_Tang

最新推荐文章于 2023-05-12 17:58:27 发布

阅读量2.1k

点赞数 4

分类专栏： NLP 核心推导文章标签：自然语言处理机器学习

本文链接：https://blog.csdn.net/Jay_Tang/article/details/105577295

版权

往期文章链接目录

文章目录

Comparison between CBOW and Skip-gram

The major difference is that skip-gram is better for infrequent words than CBOW in word2vec. For simplicity, suppose there is a sentence “ $w_1w_2w_3w_4$ ”, and the window size is $1$ .

For CBOW, it learns to predict the word given a context, or to maximize the following probability

$p(w_2|w_1,w_3) \cdot P(w_3|w_2,w_4)$

This is an issue for infrequent words, since they don’t appear very often in a given context. As a result, the model will assign them a low probabilities.

For Skip-gram, it learns to predict the context given a word, or to maximize the following probability

In this case, two words (one infrequent and the other frequent) are treated the same. Both are treated as word AND context observations. Hence, the model will learn to understand even rare words.

Skip-gram

Main idea of Skip-gram

Goal: The Skip-gram model aims to learn continuous feature representations for words by optimizing a neighborhood preserving likelihood objective.
Assumption: The Skip-gram objective is based on the distributional hypothesis which states that words in similar contexts tend to have similar meanings. That is, similar words tend to appear in similar word neighborhoods.
Algorithm: It scans over the words of a document, and for every word it aims to embed it such that the word’s features can predict nearby words (i.e., words inside some context window). The word feature representations are learned by optmizing the likelihood objective using SGD with negative sampling.

Skip-gram model formulation

Skip-gram learns to predict the context given a word by optimizing the likelihood objective. Suppose now we have a sentence

$\text{"I am writing a summary for NLP."}$

and the model is trying to predict context words given a target word “summary” with window size $2$ :

$\text {I am [ ] [ ] summary [ ] [ ] . }$

Then the model tries to optimize the likelihood

$P(\text{"writing"}|\text{"summary"}) \cdot P(\text{"a"}|\text{"summary"}) \cdot P(\text{"for"}|\text{"summary"}) \cdot P(\text{"NLP"}|\text{"summary"})$

最低0.47元/天解锁文章

Jay_Tang

关注

4
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
干货: Skip-gram 详细推导加分析

往期文章链接目录Comparison between CBOW and Skip-gramThe major difference is that skip-gram is better for infrequent words than CBOW in word2vec. For simplicity, suppose there is a sentence “w1w2w3w4w_1w_2...
复制链接

扫一扫