在做CS231 2020 Assignment1的SVM部分时,遇到了关于hinge loss的求梯度(求导)编程实现的问题,故在此记录一下。
首先,给出hinge loss在多分类时的表达式:
L
i
=
∑
j
≠
y
i
m
a
x
(
0
,
w
j
T
x
i
−
w
y
i
T
x
i
+
Δ
)
L_i=\sum_{j\neq y_i}max(0,w_j^Tx_i-w_{y_i}^Tx_i+\Delta)
Li=j=yi∑max(0,wjTxi−wyiTxi+Δ)
其中,
Δ
=
1
\Delta=1
Δ=1。通过对
w
w
w求偏导,可以得到最终的求导结果:
∂
L
i
∂
w
j
=
1
(
w
j
T
x
i
−
w
y
i
T
x
+
Δ
)
x
i
\frac{\partial{L_i}}{\partial{w_j}}=\bold 1(w_j^Tx_i-w_{y_i}^Tx+\Delta)x_i
∂wj∂Li=1(wjTxi−wyiTx+Δ)xi
∂
L
i
∂
w
y
i
=
−
(
∑
j
≠
y
i
1
(
w
j
T
x
i
−
w
y
i
T
x
+
Δ
)
)
x
i
\frac{\partial{L_i}}{\partial{w_{y_i}}}=-(\sum_{j\neq y_i}\bold1(w_j^Tx_i-w_{y_i}^Tx+\Delta))x_i
∂wyi∂Li=−(j=yi∑1(wjTxi−wyiTx+Δ))xi
最终的代码如下所示:
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W)
#print(scores.shape) (10,)
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
dW[:,y[i]]-=X[i].T
dW[:,j]+=X[i].T
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
# Add regularization to the loss.
loss += reg * np.sum(W * W)
#############################################################################
# TODO: #
# Compute the gradient of the loss function and store it dW. #
# Rather than first computing the loss and then computing the derivative, #
# it may be simpler to compute the derivative at the same time that the #
# loss is being computed. As a result you may need to modify some of the #
# code above to compute the gradient. #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
dW/=num_train
dW+=reg*W
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return loss, dW