Softmax
Softmax的损失函数为
L
i
=
−
log
p
y
i
=
−
log
(
e
f
y
i
∑
j
e
f
j
)
=
−
f
y
i
+
log
∑
j
e
f
j
L_{i}=-\log p_{y_{i}}=-\log \left(\frac{e^{f_{y_{i}}}}{\sum_{j} e^{f_{j}}}\right)=-f_{y_{i}}+\log \sum_{j} e^{f_{j}}
Li=−logpyi=−log(∑jefjefyi)=−fyi+logj∑efj
梯度推导参考文章:【学习笔记】cs231n中assignment1中的 Softmax exercise
代码:
softmax.py 中的softmax_loss_naive()函数
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
num_train = X.shape[0]
num_class = W.shape[1]
for i in range(num_train):
score = X[i].dot(W)
score -= np.max(score) # 提高计算中的数值稳定性
correct_score = score[y[i]] # 取分类正确的评分值
exp_sum = np.sum(np.exp(score))
loss += np.log(exp_sum) - correct_score
for j in xrange(num_class):
if j == y[i]:
dW[:, j] += np.exp(score[j]) / exp_sum * X[i] - X[i]
else:
dW[:, j] += np.exp(score[j]) / exp_sum * X[i]
loss /= num_train
loss += 0.5 * reg * np.sum(W * W)
dW /= num_train
dW += reg * W
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
softmax.py 中的softmax_loss_vectorized()函数
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
num_train = X.shape[0]
score = X.dot(W)
# axis = 1每一行的最大值,score仍为500*10
score -= np.max(score,axis=1)[:,np.newaxis]
# correct_score变为500 * 1
correct_score = score[range(num_train), y]
exp_score = np.exp(score)
# sum_exp_score维度为500*1
sum_exp_score = np.sum(exp_score,axis=1)
# 计算loss
loss = np.sum(np.log(sum_exp_score) - correct_score)
loss /= num_train
loss += 0.5 * reg * np.sum(W * W)
# 计算梯度
margin = np.exp(score) / sum_exp_score.reshape(num_train,1)
margin[np.arange(num_train), y] += -1
dW = X.T.dot(margin)
dW /= num_train
dW += reg * W
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
接下来选择合适的超参
softmax文件
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
iters = 2000
for lr in learning_rates:
for rs in regularization_strengths:
softmax = Softmax()
loss_hist = softmax.train(X_train,y_train,learning_rate=lr,reg=rs,num_iters=iters)
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()
y_train_pred = softmax.predict(X_train)
acc_train = np.mean(y_train == y_train_pred)
y_val_pred = softmax.predict(X_val)
acc_val = np.mean(y_val == y_val_pred)
results[(lr, rs)] = (acc_train, acc_val)
if best_val < acc_val:
best_val = acc_val
best_softmax = softmax
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
这个超参选择可以一次选多个参数,迭代次数大一些,然后逐步根据准确率缩小区间。
最终测试集准确率达到0.38
Inline Question
Inline Question 1
Why do we expect our loss to be close to -log(0.1)? Explain briefly.**
Y o u r A n s w e r : \color{blue}{\textit Your Answer:} YourAnswer: Fill this in
因为W是随机生成的,故分类正确的概率为1/10,即损失函数为-log(0.1)。
Inline Question 2 - True or False
Suppose the overall training loss is defined as the sum of the per-datapoint loss over all training examples. It is possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.
Y
o
u
r
A
n
s
w
e
r
:
\color{blue}{\textit Your Answer:}
YourAnswer:
正确
Y
o
u
r
E
x
p
l
a
n
a
t
i
o
n
:
\color{blue}{\textit Your Explanation:}
YourExplanation:
由于SVM损失函数计算时如果新加入的测试图片分类正确,则loss一定为0;但是对与softmax而言,不论分类是否正确,loss总会是存在的,即使loss趋近于0,但整体而言,损失值变化了。