一、几个公式&几句描述SVM的话
(一)Scores 和 Loss Function
第i个样本,第j类的得分:
s
j
=
f
(
x
i
,
W
)
j
s_j=f(x_i,W)_j
sj=f(xi,W)j
第i个样本的loss:
L
i
=
∑
j
≠
y
i
m
a
x
(
0
,
s
j
−
s
y
i
+
Δ
)
L_i=∑_{ j≠y_i}max(0,s_j−s_{y_i}+Δ)
Li=j̸=yi∑max(0,sj−syi+Δ)
In summary, the SVM loss function wants the score of the correct class yi to be larger than the incorrect class scores by at least by Δ (delta). If this is not the case, we will accumulate loss.
(二)梯度公式
梯度怎么求?用链导法则。
1.不进行正则化(without regularization)
- Li先对t = sj−syi+Δ求导,即求max(0,_)函数的导数
d L i / d t = 0 ( w h e n t < 0 ) , o r = 1 ( w h e n t > 0 ) dL_i/dt = 0(when \space t <0), or =1(when \space t >0) dLi/dt=0(when t<0),or=1(when t>0) - t = sj−syi+Δ 对 W求导
t = s j − s y i + Δ t = s_j−s_{y_i}+Δ t=sj−syi+Δ即 t = X i W j − X i W y i + Δ t = X_iW_j-X_iW_{y_i}+Δ t=XiWj−XiWyi+Δ
那么 d t / d W j = X i dt/dW_j = X_i dt/dWj=Xi d t / d W y i = − X i dt/dW_{y_i} = -X_i dt/dWyi=−Xi - 两步结合
怎么结合呢?对于grad矩阵的一列,是两个梯度向量都用上,还是j列只用对 W j W_j Wj求得的梯度向量、yi列只用对 W y i W_{y_i} Wyi求得的梯度向量?
注意高数的知识:
d L i / d W = d t / d W j + d t / d W y i dL_i/dW = dt/dW_j + dt/dW_{y_i} dLi/dW=dt/dWj+dt/dWyi所以是两个梯度向量都要用上
我一开始在这儿也想错了,贴个链接,表示感谢
https://blog.csdn.net/silent_crown/article/details/78109461
2.进行正则化(with regularization)
进行正则化以后,Loss的计算公式变为:
L
i
=
∑
j
≠
y
i
m
a
x
(
0
,
s
j
−
s
y
i
+
Δ
)
+
λ
W
2
L_i=∑_{ j≠y_i}max(0,s_j−s_{y_i}+Δ)+\lambda W^2
Li=j̸=yi∑max(0,sj−syi+Δ)+λW2
对应地,梯度矩阵中也要增加
λ
W
\lambda W
λW这一项
二、手撸梯度代码
def svm_loss_naive(W, X, y, reg):
"""
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
num_classes = W.shape[1] # 列数
num_train = X.shape[0] # 行数
loss = 0.0
for i in xrange(num_train):
scores = np.dot(X[i,:].reshape(1,-1), W)
#print(scores.shape)
correct_class_score = scores[0,y[i]]
for j in xrange(num_classes):
margin = scores[0,j] - correct_class_score + 1 # note delta = 1
if j == y[i]:
#if margin > 0:
# dW[:,j] -= X[i]
continue
else:
if margin > 0:
loss += margin
dW[:,y[i]] -= X[i]
dW[:,j] += X[i]
#else:
# loss += 0
# dW[:,j] += np.zeros(3073)
dW /= num_train
dW += reg * W # !注意这里先平均,再加regularizaiton项带来的梯度!
loss /= num_train
loss += reg * np.sum(W * W)
return loss, dW
三、
numpy数组,取出每行的指定列元素,组成一个列向量:
一个numpy矩阵,全部初始化为同一个值,两种方法:
npdelta = delta * np.ones(scores_yi.shape) # 略快一点点,很少很少的一点
或者
npdelta = np.zeros(scores_yi.shape)
npdelta[:] = delta