CS224n(winter 2019)笔记——Lecture1

本节主要内容

word representation
word2vec

How to represent word?

Problems with resources like WordNet

1.Great as a resource but missing nuance​​​​​​​
2 Missing new meanings of words

  • Impossible to keep up-to-date

3.Subjective
4.Requires human labor to create and adapt
5.Can’t compute accurate word similarity

Problem with words as discrete symbols(one-hot vector)

1.No natural notion of similarity for one-hot vectors
2.Vector dimension is too large

Word2vec

Word2Vec:objective function

求极大似然
For each position t = 1 , … , T t = 1, … , T t=1,,T, predict context words within a
window of fixed size m, given center word w j w_j wj.
极大似然公式:
L i k e l i h o o d = L ( θ ) = ∏ t = 1 T ∏ − m ≤ j ≤ m P ( w t + j ∣ w t ; θ ) Likelihood = L(\theta)=\prod_{t=1}^T\prod_{-m\leq j\leq m}P(w_{t+j}|w_t;\theta) Likelihood=L(θ)=t=1TmjmP(wt+jwt;θ)
损失函数:
J ( θ ) = − 1 T l o g L ( θ ) = − 1 T ∑ t = 1 T ∑ − m ≤ j ≤ m l o g P ( w t + j ∣ w t ; θ ) J(\theta)=-\frac1TlogL(\theta)=-\frac1T\sum_{t=1}^T\sum_{-m\leq j \leq m}logP(w_{t+j}|w_t;\theta) J(θ)=T1logL(θ)=T1t=1TmjmlogP(wt+jwt;θ)

如何计算 P ( w t + j ∣ w t ; θ ) P(w_{t+j}|w_t;\theta) P(wt+jwt;θ)?

定义符号如下,

  • v w    w h e n    w    i s    a    c e n t e r    w o r d v_w\; when\; w\; is\; a\; center\; word vwwhenwisacenterword
  • u w    w h e n    w    i s    a    c o n t e x t    w o r d u_w\; when\; w\; is\; a\;context\; word uwwhenwisacontextword


P ( o ∣ c ) = e x p ( u o T v c ) ∑ w ∈ V e x p ( u w T v c ) P(o|c) = \frac{exp(u_o^Tv_c)}{\sum\limits_{w\in V}exp(u_w^Tv_c)} P(oc)=wVexp(uwTvc)exp(uoTvc)

f ( θ ) = l o g P ( o ∣ c ) f(\theta)=logP(o|c) f(θ)=logP(oc) ,求偏导

∂ f ( θ ) ∂ v c = ∂ ∂ v c l o g e x p ( u o T v c ) ∑ w ∈ V e x p ( u w T v c ) = ∂ ∂ v c l o g e x p ( u o T v c ) − ∂ ∂ v c l o g ∑ w ∈ V e x p ( u w T v c ) \begin{aligned} \frac{\partial f(\theta)}{\partial v_c}&=\frac \partial{\partial v_c}log\frac{exp(u_o^Tv_c)}{\sum\limits_{w\in V}exp(u^T_wv_c)}\\&=\frac \partial{\partial v_c}logexp(u_o^Tv_c)-\frac\partial{\partial v_c}log\sum\limits_{w\in V}exp(u_w^Tv_c)\end{aligned} vcf(θ)=vclogwVexp(uwTvc)exp(uoTvc)=vclogexp(uoTvc)vclogwVexp(uwTvc)
前一部分
f 1 ( θ ) = ∂ ∂ v c l o g e x p ( u o T v c ) = u o f_1(\theta)=\frac \partial{\partial v_c}logexp(u_o^Tv_c)=u_o f1(θ)=vclogexp(uoTvc)=uo
后一部分
f 2 ( θ ) = ∂ ∂ v c l o g ∑ w ∈ V e x p ( u w T v c ) = 1 ∑ w ∈ V e x p ( u w T v c ) ∑ x ∈ V e x p ( u x T v c ) ⋅ u x = ∑ x ∈ V e x p ( u x T v c ) ∑ w ∈ V e x p ( u w T v c ) ⋅ u x = ∑ x ∈ V P ( x ∣ c ) ⋅ u x \begin{aligned} f_2(\theta)&=\frac\partial{\partial v_c}log\sum\limits_{w\in V}exp(u_w^Tv_c)\\&=\frac1{\sum\limits_{w\in V}exp(u_w^Tv_c)}\sum\limits_{x\in V}exp(u_x^Tv_c)\cdot u_x\\&=\sum\limits_{x\in V}\frac{exp(u_x^Tv_c)}{\sum\limits_{w\in V}exp(u_w^Tv_c)}\cdot u_x \\&=\sum\limits_{x\in V}P(x|c)\cdot u_x \end{aligned} f2(θ)=vclogwVexp(uwTvc)=wVexp(uwTvc)1xVexp(uxTvc)ux=xVwVexp(uwTvc)exp(uxTvc)ux=xVP(xc)ux
f ( θ ) = f 1 ( θ ) − f 2 ( θ ) = u o − ∑ x ∈ V P ( x ∣ c ) ⋅ u x f(\theta) = f_1(\theta)-f_2(\theta)=u_o-\sum\limits_{x\in V}P(x|c)\cdot u_x f(θ)=f1(θ)f2(θ)=uoxVP(xc)ux
根据概率论的知识, ∑ x = 1 V p ( x ∣ c ) u x ⃗ \sum_{x=1}^{V}p(x|c)\vec{u_x} x=1Vp(xc)ux 正是 u o ⃗ \vec{u_o} uo 对应的期望向量的方向,而 ∂ f ∂ v c ⃗ \frac{\partial f}{\partial{\vec{v_c}}} vc f这个梯度则是把当前的 u o ⃗ \vec{u_o} uo 向其期望靠拢的话,需要的一个向量的差值,这与 ∂ f ∂ v c ⃗ \frac{\partial f}{\partial{\vec{v_c}}} vc f的定义刚好一致。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值