CS224n课程Assignment1参考答案

A s s i g n m e n t # 1 − s o l u t i o n B y    J o n a r i g u e z Assignment\#1-solution\quad By\; Jonariguez Assignment#1solutionByJonariguez

所有的代码题目对应的代码已上传至github/CS224n/Jonariguez

1a

解:
s o f t m a x ( x ) i = e x i ∑ j e x j = e c e x i e c ∑ j e x j = e x i + c ∑ j e x j + c = s o f t m a x ( x + c ) i softmax(\mathbf{x})_i=\frac{e^{x_i}}{\sum_{j}{e^{x_j}}}=\frac{e^ce^{x_i}}{e^c\sum_{j}{e^{x_j}}}=\frac{e^{x_i+c}}{\sum_{j}{e^{x_j+c}}}=softmax(\mathbf{x}+c)_i softmax(x)i=jexjexi=ecjexjecexi=jexj+cexi+c=softmax(x+c)i

s o f t m a x ( x ) = s o f t m a x ( x + c ) softmax(\mathbf{x})=softmax(\mathbf{x}+c) softmax(x)=softmax(x+c)
证毕

1b

解:
直接在代码中利用numpy实现即可。注意要先从 x x x中减去每一行的最大值,这样在保证结果不变的情况下,所有的元素不大于0,不会出现上溢出,从而保证结果的正确性。具体可参考 http://www.hankcs.com/ml/computing-log-sum-exp.html

def softmax(x):
   """Compute the softmax function for each row of the input x.

   It is crucial that this function is optimized for speed because
   it will be used frequently in later code. You might find numpy
   functions np.exp, np.sum, np.reshape, np.max, and numpy
   broadcasting useful for this task.

   Numpy broadcasting documentation:
   http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

   You should also make sure that your code works for a single
   N-dimensional vector (treat the vector as a single row) and
   for M x N matrices. This may be useful for testing later. Also,
   make sure that the dimensions of the output match the input.

   You must implement the optimization in problem 1(a) of the
   written assignment!

   Arguments:
   x -- A N dimensional vector or M x N dimensional numpy matrix.

   Return:
   x -- You are allowed to modify x in-place
   """
   orig_shape = x.shape

   if len(x.shape) > 1:
       # Matrix
       # 每行减去该行的最大值
       x = x-np.max(x,axis=1).reshape(x.shape[0],1)
       # 然后进行softmax计算
       x = np.exp(x)/np.sum(np.exp(x),axis=1).reshape(x.shape[0],1)
   else:
       # Vector
       x = x-np.max(x)
       x = np.exp(x)/np.sum(np.exp(x))

   assert x.shape == orig_shape
   return x

2a

解:
σ ′ ( x ) = e − x ( 1 + e − x ) 2 = 1 1 + e − x ⋅ e − x 1 + e − x = σ ( x ) ⋅ ( 1 − σ ( x ) ) \sigma'(x)=\frac{e^{-x}}{(1+e^{-x})^2}=\frac{1}{1+e^{-x}}\cdot\frac{e^{-x}}{1+e^{-x}}=\sigma(x)\cdot(1-\sigma(x)) σ(x)=(1+ex)2ex=1+ex11+exex=σ(x)(1σ(x))

s i g m o i d sigmoid sigmoid函数的求导可以由其本身来表示。

2b

解:
我们知道真实标记 y y y是one-hot向量,因此我们下面的推导都基于 y k = 1 y_k=1 yk=1 ,且 y i = 0 , i ≠ k y_i=0,i\neq k yi=0,i̸=k ,即真实标记是 k k k .

∂ C E ( y , y ^ ) ∂ θ = ∂ C E ( y , y ^ ) ∂ y ^ ⋅ ∂ y ^ ∂ θ \frac{\partial CE(y,\hat{y})}{\partial\theta}=\frac{\partial CE(y,\hat{y})}{\partial\hat{y}}\cdot\frac{\partial\hat{y}}{\partial\theta} θCE(y,y^)=y^CE(y,y^)θy^

其中:
∂ C E ( y , y ^ ) ∂ y ^ = − ∑ i y i y ^ i = − 1 y ^ k \frac{\partial CE(y,\hat{y})}{\partial\hat{y}}=-\sum_{i}{\frac{y_i}{\hat{y}_i}}=-\frac{1}{\hat{y}_k} y^CE(y,y^)=iy^iyi=y^k1

接下来讨论 ∂ y ^ ∂ θ \frac{\partial\hat{y}}{\partial\theta} θy^ :

  1. i = k i=k i=k:
    ∂ y ^ ∂ θ k = ∂ ∂ θ k ( e θ k ∑ j e θ j ) = y ^ k ⋅ ( 1 − y ^ k ) \frac{\partial\hat{y}}{\partial\theta_k}=\frac{\partial}{\partial\theta_k}(\frac{e^{\theta_k}}{\sum_{j}{e^{\theta_j}}})=\hat{y}_k\cdot(1-\hat{y}_k) θky^=θk(jeθjeθk)=y^k(1y^k)

则:
∂ C E θ i = ∂ C E ∂ y ^ ∂ y ^ θ i = − 1 y ^ k ⋅ y ^ k ⋅ ( 1 − y ^ k ) = y ^ i − 1 \frac{\partial CE}{\theta_i}=\frac{\partial CE}{\partial\hat{y}}\frac{\partial\hat{y}}{\theta_i}=-\frac{1}{\hat{y}_k}\cdot\hat{y}_k\cdot(1-\hat{y}_k)=\hat{y}_i-1 θiCE=y^CEθiy^=y^k1y^k(1y^k)=<

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值