个人觉得svm和softmax的梯度部分是这份作业的难点,参考了一些代码觉得还是难以理解,网上似乎也没有相关的解释,所以想把自己的想法贴出来,提供一个参考。
首先贴上参考的代码:
def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation.
Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
num_train= X.shape[0]
dW = np.zeros(W.shape) # initialize the gradient as zero
scores = np.dot(X,W)
correct_class_scores = scores[np.arange(num_train),y]
correct_class_scores = np.reshape(correct_class_scores,(num_train,-1))
margin = scores-correct_class_scores+1.0
margin[np.arange(num_train),y]=0.0
margin[margin<=0]=0.0 # ** #
loss += np.sum(margi