CS224n(2019):Assignment2 参考答案

CS224n : Assignment2 参考答案

本文为cs224n课程(winter,2019)的 assignment2 中的公式推导部分参考答案。如有疑问或者错误之处,欢迎交流。
Assignment2 原文
Assignment2 编码部分参考答案

Variables notation

Attention: All the variables’ dimensions here are consistent with the code part in Assignment 2 for easy understanding.

U \boldsymbol U U , matrix of shape (vocab_size,embedding_dim) ,all the ‘outside’ vectors .

V \boldsymbol V V, matrix of shape (vocab_size,embedding_dim) ,all the ‘center’ vectors .

y \boldsymbol y y, vector of shape (vocab_size,1), the true empirical distribution y \boldsymbol y y is a one-hot vector with a 1 for the true outside word o, and 0 everywhere else .

y ^ \hat{\boldsymbol{y}} y^, vector of shape (vocab_size,1), the predicted distribution y ^ \hat{\boldsymbol{y}} y^ is the probability distribution P ( O ∣ C = c ) P(O|C = c) P(OC=c) given by our model .

question a

Given outside word o and context word c.

The distribution of y is as follows:

y w = { 1 w=o 0 w!=o y_w= \begin{cases} 1& \text{w=o}\\ 0& \text{w!=o} \end{cases} yw={10w=ow!=o

− ∑ w = 1 V y w l o g ( y w ^ ) = − y o l o g ( y o ^ ) = − l o g ( y o ^ ) -\sum_{w=1}^{V} y_wlog(\hat{y_w}) = -y_olog(\hat{y_o})=-log(\hat{y_o}) w=1Vywlog(yw^)=yolog(yo^)=log(yo^)

Here , V represents the vocab_size.

question b

∂ J n a i v e − s o f t m a x ( v c , o , U ) ∂ v c = − ∂ l o g ( P ( O = o ∣ C = c ) ) ∂ v c = − ∂ l o g ( e x p ( u o T v c ) ) ∂ v c + ∂ l o g ( ∑ w = 1 V e x p ( u w T v c ) ) ∂ v c = − u o + ∑ w = 1 V e x p ( u w T v c ) ∑ w = 1 V e x p ( u w T v c ) u w = − u o + ∑ w = 1 V P ( O = w ∣ C = c ) u w = U T ( y ^ − y ) \frac{\partial{J_{naive-softmax}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial \boldsymbol v_c} \\= -\frac{\partial{log(P(O=o|C=c))}}{\partial \boldsymbol v_c} \\ = -\frac{\partial{log(exp( \boldsymbol u_o^T\boldsymbol v_c))}}{\partial \boldsymbol v_c} + \frac{\partial{log(\sum_{w=1}^{V}exp(\boldsymbol u_w^T\boldsymbol v_c))}}{\partial \boldsymbol v_c} \\= -\boldsymbol u_o + \sum_{w=1}^{V} \frac{exp(\boldsymbol u_w^T\boldsymbol v_c)}{\sum_{w=1}^{V}exp(\boldsymbol u_w^T\boldsymbol v_c)}\boldsymbol u_w \\= -\boldsymbol u_o+ \sum_{w=1}^{V}P(O=w|C=c)\boldsymbol u_w \\= \boldsymbol U^T(\hat{\boldsymbol y} - \boldsymbol y) vcJnaivesoftmax(vc,o,U)=vclog(P(O=oC=c))=vclog(exp(uoTvc))+vclog(w=1Vexp(uwTvc))=uo+w=1Vw=1Vexp(uwTvc)exp(uwTvc)uw=uo+w=1VP(O=wC=c)uw=UT(y^y)

question c

∂ J n a i v e − s o f t m a x ( v c , o , U ) ∂ u w = − ∂ l o g ( e x p ( u o T v c ) ) ∂ u w + ∂ l o g ( ∑ w = 1 V e x p ( u w T v c ) ) ∂ u w \frac{\partial{J_{naive-softmax}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial \boldsymbol u_w} \\= -\frac{\partial{log(exp(\boldsymbol u_o^T\boldsymbol v_c))}}{\partial \boldsymbol u_w} + \frac{\partial{log(\sum_{w=1}^{V}exp(\boldsymbol u_w^T\boldsymbol v_c))}}{\partial \boldsymbol u_w} uwJnaivesoftmax(vc,o,U)=uwlog(exp(uoTvc))+uwlog(w=1Vexp(uwTvc))

when w = o,

∂ J n a i v e − s o f t m a x ( v c , o , U ) ∂ u w = − v c + 1 ∑ w = 1 V e x p ( u w T v c ) ∂ ∑ w = 1 V e x p ( u w T v c ) ∂ u o = − v c + 1 ∑ w = 1 V e x p ( u w T v c ) ∂ e x p ( u o T v c ) ∂ u o = − v c + e x p ( u o T v c ) ∑ w = 1 V e x p ( u w T v c ) v c = ( P ( O = o ∣ C = c ) − 1 ) ) v c \frac{\partial{J_{naive-softmax}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial\boldsymbol u_w} \\= -\boldsymbol v_c + \frac{1}{\sum_{w=1}^{V} exp(\boldsymbol u_w^T\boldsymbol v_c)}\frac{\partial \sum_{w=1}^{V} exp(\boldsymbol u_w^T\boldsymbol v_c)}{\partial \boldsymbol u_o} \\= -\boldsymbol v_c + \frac{1}{\sum_{w=1}^{V} exp(\boldsymbol u_w^T\boldsymbol v_c)}\frac{\partial exp(\boldsymbol u_o^T\boldsymbol v_c)}{\partial \boldsymbol u_o} \\= -\boldsymbol v_c + \frac{ exp(\boldsymbol u_o^T\boldsymbol v_c)}{\sum_{w=1}^{V} exp(\boldsymbol u_w^T\boldsymbol v_c)}\boldsymbol v_c \\= (P(O=o|C=c)-1))\boldsymbol v_c uwJnaivesoftmax(vc,o,U)=vc+w=1Vexp(uwTvc)1uow=1Vexp(uwTvc)=vc+w=1Vexp(uwTvc)1uoexp(uoTvc)=vc+w=1Vexp(uwTvc)exp(uoTvc)vc=(P(O=oC=c)1))vc

when w != o,

∂ J n a i v e − s o f t m a x ( v c , o , U ) ∂ u w = e x p ( u w T v c ) ∑ w = 1 V e x p ( u w T v c ) v c = P ( O = w ∣ C = c ) v c \frac{\partial{J_{naive-softmax}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial \boldsymbol u_w} \\= \frac{ exp(\boldsymbol u_w^T\boldsymbol v_c)}{\sum_{w=1}^{V} exp(\boldsymbol u_w^T\boldsymbol v_c)}\boldsymbol v_c \\= P(O=w|C=c)\boldsymbol v_c uwJnaivesoftmax(vc,o,U)=w=1Vexp(uwTvc)exp(uwTvc)vc=P(O=wC=c)vc

In summary,

∂ J n a i v e − s o f t m a x ( v c , o , U ) ∂ U = ( y ^ − y ) T v c \frac{\partial{J_{naive-softmax}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial \boldsymbol U} \\= (\hat {\boldsymbol y} - \boldsymbol y)^T\boldsymbol v_c UJnaivesoftmax(vc,o,U)=(y^y)Tvc

question d

∂ σ ( x ) ∂ x = ∂ e x e x + 1 ∂ x = e x ( e x + 1 ) − e x e x ( e x + 1 ) 2 = e x ( e x + 1 ) 2 = σ ( x ) ( 1 − σ ( x ) ) \frac{\partial \sigma(x)}{\partial x} = \frac{\partial \frac{e^x}{e^x+1}}{\partial x} = \frac{e^x(e^x+1)-e^xe^x}{(e^x+1)^2} \\= \frac{e^x}{(e^x+1)^2} = \sigma (x) (1- \sigma(x)) xσ(x)=xex+1ex=(ex+1)2ex(ex+1)exex=(ex+1)2ex=σ(x)(1σ(x))

question e

i)
∂ J n e g − s a m p l e ( v c , o , U ) ∂ v c = ∂ ( − l o g ( σ ( u o T v c ) ) − ∑ k = 1 K l o g ( σ ( − u k T v c ) ) ) ∂ v c = − σ ( u o T v c ) ( 1 − σ ( u o T v c ) ) σ ( u o T v c ) ∂ u o T v c ∂ v c − ∑ k = 1 K ∂ l o g ( σ ( − u k T v c ) ) ∂ v c = − ( 1 − σ ( u o T v c ) ) u o + ∑ k = 1 K ( 1 − σ ( − u k T v c ) ) u k \frac{\partial{J_{neg-sample}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial\boldsymbol v_c} \\= \frac{\partial (-log(\sigma (\boldsymbol u_o^T\boldsymbol v_c))-\sum_{k=1}^{K} log(\sigma (-\boldsymbol u_k^T\boldsymbol v_c)))}{\partial \boldsymbol v_c} \\= -\frac{\sigma(\boldsymbol u_o^T\boldsymbol v_c)(1-\sigma(\boldsymbol u_o^T\boldsymbol v_c))}{\sigma(\boldsymbol u_o^T\boldsymbol v_c)}\frac{\partial \boldsymbol u_o^T\boldsymbol v_c}{\partial \boldsymbol v_c} - \sum_{k=1}^{K}\frac{\partial log(\sigma(-\boldsymbol u_k^T\boldsymbol v_c))}{\partial \boldsymbol v_c} \\= -(1-\sigma(\boldsymbol u_o^T\boldsymbol v_c))\boldsymbol u_o+\sum_{k=1}^{K}(1-\sigma(-\boldsymbol u_k^T\boldsymbol v_c))\boldsymbol u_k vcJnegsample(vc,o,U)=vc(log(σ(uoTvc))k=1Klog(σ(ukTvc)))=σ(uoTvc)σ(uoTvc)(1σ(uoTvc))vcuoTvck=1Kvclog(σ(ukTvc))=(1σ(uoTvc))uo+k=1K(1σ(ukTvc))uk

ii)
∂ J n e g − s a m p l e ( v c , o , U ) ∂ u o = ∂ ( − l o g ( σ ( u o T v c ) ) ∂ u o = − ( 1 − σ ( u o T v c ) ) v c \frac{\partial{J_{neg-sample}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial \boldsymbol u_o} \\= \frac{\partial (-log(\sigma (\boldsymbol u_o^T\boldsymbol v_c))}{\partial \boldsymbol u_o} = -(1-\sigma(\boldsymbol u_o^T\boldsymbol v_c))\boldsymbol v_c uoJnegsample(vc,o,U)=uo(log(σ(uoTvc))=(1σ(uoTvc))vc

iii)
∂ J n e g − s a m p l e ( v c , o , U ) ∂ u k = ∂ ( − l o g ( σ ( − u k T v c ) ) ∂ u k = ( 1 − σ ( − u k T v c ) ) v c \frac{\partial{J_{neg-sample}(\boldsymbol v_c,o,\boldsymbol U)}}{\partial \boldsymbol u_k} \\= \frac{\partial (-log(\sigma (-\boldsymbol u_k^T\boldsymbol v_c))}{\partial \boldsymbol u_k} = (1-\sigma(-\boldsymbol u_k^T\boldsymbol v_c))\boldsymbol v_c ukJnegsample(vc,o,U)=uk(log(σ(ukTvc))=(1σ(ukTvc))vc

qustion f

i)

∂ J s k i p − g r a m ( v c , w t − m , . . . , w t + m , U ) ∂ U = ∑ − m &lt; = j &lt; = m , j ! = 0 ∂ J ( v c , w t + j , U ) ∂ U \frac{\partial J_{skip-gram}(\boldsymbol v_c,w_{t-m},...,w_{t+m},\boldsymbol U)}{\partial \boldsymbol U} \\= \sum_{-m&lt;=j&lt;=m,j!=0}\frac{\partial J(\boldsymbol v_c,w_{t+j},\boldsymbol U)}{\partial \boldsymbol U} UJskipgram(vc,wtm,...,wt+m,U)=m<=j<=m,j!=0UJ(vc,wt+j,U)

ii)

when w=c,

∂ J s k i p − g r a m ( v c , w t − m , . . . , w t + m , U ) ∂ v c = ∑ − m &lt; = j &lt; = m , j ! = 0 ∂ J ( v c , w t + j , U ) ∂ v c \frac{\partial J_{skip-gram}(\boldsymbol v_c,w_{t-m},...,w_{t+m},\boldsymbol U)}{\partial \boldsymbol v_c} \\= \sum_{-m&lt;=j&lt;=m,j!=0}\frac{\partial J(\boldsymbol v_c,w_{t+j},\boldsymbol U)}{\partial \boldsymbol v_c} vcJskipgram(vc,wtm,...,wt+m,U)=m<=j<=m,j!=0vcJ(vc,wt+j,U)

iii)

when w!=c,

∂ J s k i p − g r a m ( v c , w t − m , . . . , w t + m , U ) ∂ v w = 0 \frac{\partial J_{skip-gram}(\boldsymbol v_c,w_{t-m},...,w_{t+m},\boldsymbol U)}{\partial \boldsymbol v_w} \\= \boldsymbol 0 vwJskipgram(vc,wtm,...,wt+m,U)=0

评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值