cs231n - assignment1 - softmax 梯度推导

原创 2016年07月16日 15:19:07

Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.


This exercise is analogous to the SVM exercise. You will:


- implement a fully-vectorized loss function for the Softmax classifier
- implement the fully-vectorized expression for its analytic gradient
- check your implementation with numerical gradient
- use a validation set to tune the learning rate and regularization strength
- optimize the loss function with SGD
- visualize the final learned weights

和linear_svm一样,主要难点是求导操作,不过softmax的求导更简单一些。
首先还是给出 Loss 的公式:

L=1NiLi+λR(W)1

其中共有 N 个样本,每个样本带来的 Loss 是 Li:
Li=logpyi=logefyijefj=fyi+logjefj2

对于每一个样本 Xi , 由于 softmax 的分母对所有的 fj 进行了累积求和, 所以 LiW的导数对 W的每一列都又贡献, 即 LiWj 对所有的 j 都不为 0:
j=yi 时:

LiWj=efyijefjfiWj=efyijefjXTi3

j==yi 时:
LiWj=efyijefjfiWj=efyijefjXTiXTi4

对所有样本都求出对应的Loss, 累积求和,并加上正则项即可以得到最终要求的Loss了。


上面求导数过程是把 Loss 对于 W 的导数显示的写出来,然后直接对 W 求导数,在这个简单的例子中可以这样,但是一旦网络变得复杂了,就很难直接写出Loss 对于要求的表达式的导数了。一种比较好的方式是利用 chain rule 逐级的求导数:

pk=efkjefj,Li=logpyi5

这里 fk 是 softmax 层的输出,由上面公式 (2) 可以求出 Loss 对 fk 的导数为:
Lifk=pk1(yi=k)6
该式子表明 Loss 对 softmax 层的输出的导数为 pk,并且当 k=yi 时导数项还要减去1。
把式(6) 改写为向量形式:
Lif=p[0...1...](yi1)6a

现在考虑第二层 fully connected layer,也就是紧连着 softmax 的那一层全连接层,这一层的输入是隐藏层的输出 hiddenlayer[1×H], 所以softmax的输入 f=hidden_layer.dot(W2)+b2, 检查一下维度, fC 维向量, W2H×C 的二维矩阵, b2C 维向量,没问题。 现在就可以来求 fW2 的导数了:

fW=hidden_layer.T(H×1)7
可以看到,fW的是全连接层的输入向量。

综合以上结果就可以求得:

LiW=fWLif(8)


最后对所有 N 个样本写成矩阵形式:

Lf=p[N×C]MaskMat[N×C]6m

fW=hiddenlayerT[N×H]7m

LW=fWLf[H×C](8m)

其中(6m)中的 MskMat为 N(6a)中向量组成,具体形式可以参见如下python代码:

 # compute the gradient on scores
  dscores = probs
  dscores[range(num_examples),y] -= 1

# softmax.py
import numpy as np
from random import shuffle

def softmax_loss_naive(W, X, y, reg):
  """
  Softmax loss function, naive implementation (with loops)

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using explicit loops.     #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  num_train = X.shape[0]
  num_classes = W.shape[1]
  for i in xrange(num_train):
    scores = X[i].dot(W) 
    scores -= np.max(scores) #prevents numerical instability
    correct_class_score = scores[y[i]]

    exp_sum = np.sum(np.exp(scores))
    loss += np.log(exp_sum) - correct_class_score

    dW[:, y[i]] -= X[i]
    for j in xrange(num_classes):
      dW[:,j] += (np.exp(scores[j]) / exp_sum) * X[i]

  loss /= num_train
  loss += 0.5 * reg * np.sum( W*W )
  dW /= num_train
  dW += reg * W

  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

  return loss, dW


def softmax_loss_vectorized(W, X, y, reg):
  """
  Softmax loss function, vectorized version.

  Inputs and outputs are the same as softmax_loss_naive.
  """
  # Initialize the loss and gradient to zero.
  loss = 0.0
  dW = np.zeros_like(W)

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  num_train = X.shape[0]
  num_classes = W.shape[1]

  scores = X.dot(W)
  scores -= np.max(scores, axis = 1)[:, np.newaxis]
  exp_scores = np.exp(scores)
  sum_exp_scores = np.sum(exp_scores, axis = 1)
  correct_class_score = scores[range(num_train), y]

  loss = np.sum(np.log(sum_exp_scores)) - np.sum(correct_class_score)

  exp_scores = exp_scores / sum_exp_scores[:,np.newaxis]

  # maybe here can be rewroten into matrix operations 
  for i in xrange(num_train):
    dW += exp_scores[i] * X[i][:,np.newaxis]
    dW[:, y[i]] -= X[i]

  loss /= num_train
  loss += 0.5 * reg * np.sum( W*W )
  dW /= num_train
  dW += reg * W
  #############################################################################
  #                          END OF YOUR CODE                                 #
  #############################################################################

  return loss, dW
# softmax.ipynb
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [5e-6, 1e-7, 5e-7]
regularization_strengths = [1e4, 5e4, 1e5]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
params = [(x,y) for x in learning_rates for y in regularization_strengths ]
for lrate, regular in params:
    softmax = Softmax()
    loss_hist = softmax.train(X_train, y_train, learning_rate=lrate, reg=regular,
                             num_iters=700, verbose=True)
    y_train_pred = softmax.predict(X_train)
    accuracy_train = np.mean( y_train == y_train_pred)
    y_val_pred = softmax.predict(X_val)
    accuracy_val = np.mean(y_val == y_val_pred)
    results[(lrate, regular)] = (accuracy_train, accuracy_val)
    if(best_val < accuracy_val):
        best_val = accuracy_val
        best_softmax = softmax
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val
版权声明:本文为博主原创文章,未经博主允许不得转载。

相关文章推荐

softmax loss函数

具体推导可参见网页: http://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function ...

caffe源码学习:softmaxWithLoss前向计算

caffe源码学习:softmaxWithLoss      在caffe中softmaxwithLoss是由两部分组成,softmax+Loss组成,其实主要就是为了caffe框架的可扩展性。 ...

caffe层解读系列-softmax_loss

Loss Function 可选参数 使用方法 扩展使用Loss Functionsoftmax_loss的计算包含2步:(1)计算softmax归一化概率(2) 计算损失这里以batchsize=1...
  • shuzfan
  • shuzfan
  • 2016年05月20日 13:07
  • 18332

cs231n 卷积神经网络与计算机视觉 2 SVM softmax

linear classification上节中简单介绍了图像分类的概念,并且学习了费时费内存但是精度不高的knn法,本节我们将会进一步学习一种更好的方法,以后的章节中会慢慢引入神经网络和convol...

[CS231n@Stanford] Assignment1-Q3 (python) Softmax实现

softmax.py import numpy as np from random import shuffle def softmax_loss_naive(W, X, y, reg): "...

cs231n:assignment1——Q3: Implement a Softmax classifier

Jupyter notebook softmaxipynb 内容 Softmax exercise Softmax Classifier Inline Question 1 softmaxpy 内容...

softmax回归

在上一篇文章中,讲述了广义线性模型。通过详细的讲解,针对某类指数分布族建立对应的广义线性模型。在本篇文章 中,将继续来探讨广义线性模型的一个重要例子,它可以看成是Logistic回归的扩展,即soft...

DeepLearning tutorial(1)Softmax回归原理简介+代码详解

DeepLearning tutorial(1)Softmax回归原理简介+代码详解 @author:wepon @blog:http://blog.csdn.net/u0121626...

简单易学的机器学习算法——Softmax Regression

一、Softmax Regression简介         Softmax Regression是Logistic回归的推广,Logistic回归是处理二分类问题的,而Softmax Regr...

深度学习算法原理——Softmax回归

注:最近打算将UFLDL教程重新看一遍,其实里面有很多关于神经网络以及深度学习的知识点很有用,但是只是学习深度学习的话有一些内容就有点多余,所以想整理一个笔记,记录下神经网络到深度学习的一些知识点。整...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:cs231n - assignment1 - softmax 梯度推导
举报原因:
原因补充:

(最多只允许输入30个字)