第一部分:作业内容
- implement a fully-vectorized loss function for the Softmax classifier
- implement the fully-vectorized expression for its analytic gradient
- check your implementation with numerical gradient
- use a validation set to tune the learning rate and regularization strength
- optimize the loss function with SGD
- visualize the final learned weights
其中SGD:随机梯度下降(Stochastic gradient descent)
第二部分:主要代码以及注释
#############################################################################
# TODO: Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
for i in range(X.shape[0]):
score = np.dot(X[i], W)
score -= max(score) #为了数值稳定性
score = np.exp(score) #取指数
softmax_sum = np.sum(score) #得到分母
score /= softmax_sum #除以分母得到softmax
#计算梯度
for j in range(W.shape[1]):
if j != y[i]:
dW[:,j] += score[j]*X[i]
else:
dW[:,j] -= (1-score[j])*X[i]
loss -= np.log(score[y[i]]) #得到交叉熵
loss /= X.shape[0] #求平均
dW /= X.shape[0] #求平均
loss += reg*np.sum(W*W) #正则项
dW += 2*reg*W
#############################################################################
# END OF YOUR CODE #
#############################################################################
Inline Question 1:
Why do we expect our loss to be close to -log(0.1)? Explain briefly.**
Your answer: 因为w随机初始化,所以每个类计算的得分是相同的,经过softmax之后的概率是一样的,而这是一个十分类的问题,因此每一个类的概率都是0.1,求得交叉熵也就是-log(0.1)
#############################################################################
# TODO: Compute the softmax loss and its gradient using no explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
scores = np.dot(X,W)
scores -= np.max(scores, axis=1, keepdims=True) #为了数值稳定性
scores = np.exp(scores) #取指数
scores /= np.sum(scores, axis=1, keepdims= True) #除以分母得到softmax
ds = np.copy(scores)
ds[np.arange(X.shape[0]),y] -= 1
dW = np.dot(X.T,ds) # X*W=S 求导链式法则
loss = scores[np.arange(X.shape[0]),y]
loss = -np.log(loss).sum()
loss /= X.shape[0] #求平均
dW /= X.shape[0] #求平均
loss += reg * np.sum(W * W) # 正则项
dW += 2 * reg * W
#############################################################################
# END OF YOUR CODE #
#############################################################################
# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e3, 5e3, 7e3]
################################################################################
# TODO: #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save #
# the best trained softmax classifer in best_softmax. #
################################################################################
from copy import deepcopy
for lr in learning_rates:
for reg in regularization_strengths:
softmax = Softmax()
softmax.train(X_train, y_train, lr, reg, 1500, 128)
train_pred = softmax.predict(X_train)
train_acc = np.mean(train_pred == y_train)
val_pred = softmax.predict(X_val)
val_acc = np.mean(val_pred == y_val)
results[(lr, reg)] = [train_acc, val_acc]
if val_acc > best_val:
best_val = val_acc
best_softmax = deepcopy(softmax)
################################################################################
# END OF YOUR CODE #
################################################################################
# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy))
print('best validation accuracy achieved during cross-validation: %f' % best_val)
Inline Question - True or False
It's possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.
Your answer:True
Your explanation:因为 根据svm 的公式有可能加的数据点对svm来讲比较好辨识,所以取max之后都是0,但是对于softmax而言,总会得到一个概率分布,然后算出交叉熵,换言之,softmax的loss总会加上一个量,即使是一个很小的量。