cs231n - assignment1 - softmax 梯度推导

4403人阅读 评论(21)

Softmax exercise

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

This exercise is analogous to the SVM exercise. You will:

- implement a fully-vectorized loss function for the Softmax classifier
- implement the fully-vectorized expression for its analytic gradient
- use a validation set to tune the learning rate and regularization strength
- optimize the loss function with SGD
- visualize the final learned weights

L=1NiLi+λR(W)1

Li=logpyi=logefyijefj=fyi+logjefj2

j=yi$j ！= y_i$ 时：

LiWj=efyijefjfiWj=efyijefjXTi3

j==yi$j == y_i$ 时：
LiWj=efyijefjfiWj=efyijefjXTiXTi4

pk=efkjefj,Li=logpyi5

Lifk=pk1(yi=k)6

Lif=p[0...1...](yi1)6a

fW=hidden_layer.T(H×1)7

LiW=fWLif(8)

fW=hiddenlayerT[N×H]7m

LW=fWLf[H×C](8m)

 # compute the gradient on scores
dscores = probs
dscores[range(num_examples),y] -= 1

# softmax.py
import numpy as np
from random import shuffle

def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops)

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.

Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)

#############################################################################
# TODO: Compute the softmax loss and its gradient using explicit loops.     #
# Store the loss in loss and the gradient in dW. If you are not careful     #
# here, it is easy to run into numeric instability. Don't forget the        #
# regularization!                                                           #
#############################################################################
num_train = X.shape[0]
num_classes = W.shape[1]
for i in xrange(num_train):
scores = X[i].dot(W)
scores -= np.max(scores) #prevents numerical instability
correct_class_score = scores[y[i]]

exp_sum = np.sum(np.exp(scores))
loss += np.log(exp_sum) - correct_class_score

dW[:, y[i]] -= X[i]
for j in xrange(num_classes):
dW[:,j] += (np.exp(scores[j]) / exp_sum) * X[i]

loss /= num_train
loss += 0.5 * reg * np.sum( W*W )
dW /= num_train
dW += reg * W

#############################################################################
#                          END OF YOUR CODE                                 #
#############################################################################

return loss, dW

def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorized version.

Inputs and outputs are the same as softmax_loss_naive.
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)

#############################################################################
# TODO: Compute the softmax loss and its gradient using no explicit loops.  #
# Store the loss in loss and the gradient in dW. If you are not careful     #
# here, it is easy to run into numeric instability. Don't forget the        #
# regularization!                                                           #
#############################################################################
num_train = X.shape[0]
num_classes = W.shape[1]

scores = X.dot(W)
scores -= np.max(scores, axis = 1)[:, np.newaxis]
exp_scores = np.exp(scores)
sum_exp_scores = np.sum(exp_scores, axis = 1)
correct_class_score = scores[range(num_train), y]

loss = np.sum(np.log(sum_exp_scores)) - np.sum(correct_class_score)

exp_scores = exp_scores / sum_exp_scores[:,np.newaxis]

# maybe here can be rewroten into matrix operations
for i in xrange(num_train):
dW += exp_scores[i] * X[i][:,np.newaxis]
dW[:, y[i]] -= X[i]

loss /= num_train
loss += 0.5 * reg * np.sum( W*W )
dW /= num_train
dW += reg * W
#############################################################################
#                          END OF YOUR CODE                                 #
#############################################################################

return loss, dW

# softmax.ipynb
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [5e-6, 1e-7, 5e-7]
regularization_strengths = [1e4, 5e4, 1e5]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
params = [(x,y) for x in learning_rates for y in regularization_strengths ]
for lrate, regular in params:
softmax = Softmax()
loss_hist = softmax.train(X_train, y_train, learning_rate=lrate, reg=regular,
num_iters=700, verbose=True)
y_train_pred = softmax.predict(X_train)
accuracy_train = np.mean( y_train == y_train_pred)
y_val_pred = softmax.predict(X_val)
accuracy_val = np.mean(y_val == y_val_pred)
results[(lrate, regular)] = (accuracy_train, accuracy_val)
if(best_val < accuracy_val):
best_val = accuracy_val
best_softmax = softmax
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val
3
0

* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
个人资料
• 访问：109435次
• 积分：1518
• 等级：
• 排名：千里之外
• 原创：49篇
• 转载：30篇
• 译文：0篇
• 评论：49条
评论排行
最新评论