CS224n(winter 2019)笔记——Lecture1

最新推荐文章于 2021-05-08 21:45:06 发布

Leokb24

最新推荐文章于 2021-05-08 21:45:06 发布

阅读量474

点赞数

分类专栏：机器学习文章标签： cs224n word2vec

本文链接：https://blog.csdn.net/leo_95/article/details/88582423

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Lecture1

本节主要内容
How to represent word?
- Problems with resources like WordNet
- Problem with words as discrete symbols(one-hot vector)
Word2vec

本节主要内容

word representation
word2vec

How to represent word?

Problems with resources like WordNet

1.Great as a resource but missing nuance
2 Missing new meanings of words

Impossible to keep up-to-date

3.Subjective
4.Requires human labor to create and adapt
5.Can’t compute accurate word similarity

Problem with words as discrete symbols(one-hot vector)

1.No natural notion of similarity for one-hot vectors
2.Vector dimension is too large

Word2vec

Word2Vec:objective function

求极大似然
For each position $t = 1, \dots, T$ , predict context words within a
window of fixed size m, given center word $w_j$ .
极大似然公式：
$L(\theta)=\prod_{t=1}^T\prod_{-m\leq j\leq m}P(w_{t+j}|w_t;\theta)$
损失函数：
$J(\theta)=-\frac1TlogL(\theta)=-\frac1T\sum_{t=1}^T\sum_{-m\leq j \leq m}logP(w_{t+j}|w_t;\theta)$

如何计算 $P(w_{t+j}|w_t;\theta)$ ?

定义符号如下，

$v_w\; when\; w\; is\; a\; center\; word$
$u_w\; when\; w\; is\; a\;context\; word$

则
$\frac{exp(u_o^Tv_c)}{\sum\limits_{w\in V}exp(u_w^Tv_c)}$

令 $f(\theta)=logP(o|c)$ ，求偏导

$\begin{aligned} \frac{\partial f(\theta)}{\partial v_c}&=\frac \partial{\partial v_c}log\frac{exp(u_o^Tv_c)}{\sum\limits_{w\in V}exp(u^T_wv_c)}\\&=\frac \partial{\partial v_c}logexp(u_o^Tv_c)-\frac\partial{\partial v_c}log\sum\limits_{w\in V}exp(u_w^Tv_c)\end{aligned}$
前一部分
$f_1(\theta)=\frac \partial{\partial v_c}logexp(u_o^Tv_c)=u_o$
后一部分
$\begin{aligned} f_2(\theta)&=\frac\partial{\partial v_c}log\sum\limits_{w\in V}exp(u_w^Tv_c)\\&=\frac1{\sum\limits_{w\in V}exp(u_w^Tv_c)}\sum\limits_{x\in V}exp(u_x^Tv_c)\cdot u_x\\&=\sum\limits_{x\in V}\frac{exp(u_x^Tv_c)}{\sum\limits_{w\in V}exp(u_w^Tv_c)}\cdot u_x \\&=\sum\limits_{x\in V}P(x|c)\cdot u_x \end{aligned}$
$f(\theta) = f_1(\theta)-f_2(\theta)=u_o-\sum\limits_{x\in V}P(x|c)\cdot u_x$
根据概率论的知识， $\sum_{x=1}^{V}p(x|c)\vec{u_x}$ 正是 $\vec{u_o}$ 对应的期望向量的方向，而 $\frac{\partial f}{\partial{\vec{v_c}}}$ 这个梯度则是把当前的 $\vec{u_o}$ 向其期望靠拢的话，需要的一个向量的差值，这与 $\frac{\partial f}{\partial{\vec{v_c}}}$ 的定义刚好一致。

Leokb24

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
CS224n(winter 2019)笔记——Lecture1

Lecture1本节主要内容How to represent word?Problems with resources like WordNetProblem with words as discrete symbols(one-hot vector)Word2vec本节主要内容word representationword2vecHow to represent word?Proble...
复制链接

扫一扫

专栏目录