CS231n_2020(2)—— 线性分类
Intro to Linear classification
如第一节所讲,knn的图像分类是非常昂贵的。为此,卷积神经网络可以解决这个问题。它主要分为两部分:第一部分是一个score function将原始数据映射到class scores,以及一个损失函数;第二部分是将这个问题转化为优化问题(optimization problem),即得到使损失函数最小化的参数。
Linear score function
假设一个图像的训练集为
x
i
∈
R
D
x_i \in R^D
xi∈RD,每个都有相对应的标签
y
i
y_i
yi,其中,
i
=
1
…
N
,
y
i
∈
1
…
K
i = 1 \dots N, y_i \in { 1 \dots K }
i=1…N,yi∈1…K,这里,我们有N个例子(每个都有D个维度)和K个分类。例如,在CIFAR-10中,N=50000个图像,D = 32 x 32 x 3 = 3072个像素,K=10。
然后定义score function:
f
:
R
D
↦
R
K
f: R^D \mapsto R^K
f:RD↦RK。
Linear classifier
f
(
x
i
,
W
,
b
)
=
W
x
i
+
b
f(x_i, W, b) = W x_i + b
f(xi,W,b)=Wxi+b
在CIFAR-10,
x
i
x_i
xi包含了第i个(i-th)图像里所有像素[3072 * 1],W为[10 * 3072],b为[10 * 1]。
Interpreting a linear classifier
Loss function
Multiclass SVM
score function 记作
s
j
=
f
(
x
i
,
W
)
j
s_j = f(x_i, W)_j
sj=f(xi,W)j,那么第i个图像的Multiclass SVM损失函数如下:
L
i
=
∑
j
≠
y
i
max
(
0
,
s
j
−
s
y
i
+
Δ
)
L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)
Li=j=yi∑max(0,sj−syi+Δ)
加入L2 的正规化惩罚
R
(
W
)
=
∑
k
∑
l
W
k
,
l
2
R(W) = \sum_k\sum_l W_{k,l}^2
R(W)=∑k∑lWk,l2,得到完整的Multiclass SVM损失函数:
L
=
1
N
∑
i
L
i
⏟
data loss
+
λ
R
(
W
)
⏟
regularization loss
L = \underbrace{ \frac{1}{N} \sum_i L_i }_\text{data loss} + \underbrace{ \lambda R(W) }_\text{regularization loss} \\\\
L=data loss
N1i∑Li+regularization loss
λR(W)
即
L
=
1
N
∑
i
∑
j
≠
y
i
[
max
(
0
,
f
(
x
i
;
W
)
j
−
f
(
x
i
;
W
)
y
i
+
Δ
)
]
+
λ
∑
k
∑
l
W
k
,
l
2
L = \frac{1}{N} \sum_i \sum_{j\neq y_i} \left[ \max(0, f(x_i; W)_j - f(x_i; W)_{y_i} + \Delta) \right] + \lambda \sum_k\sum_l W_{k,l}^2
L=N1i∑j=yi∑[max(0,f(xi;W)j−f(xi;W)yi+Δ)]+λk∑l∑Wk,l2
Code:
def L_i(x, y, W):
"""
unvectorized version. Compute the multiclass svm loss for a single example (x,y)
- x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
with an appended bias dimension in the 3073-rd position (i.e. bias trick)
- y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
- W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
"""
delta = 1.0 # see notes about delta later in this section
scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
correct_class_score = scores[y]
D = W.shape[0] # number of classes, e.g. 10
loss_i = 0.0
for j in range(D): # iterate over all wrong classes
if j == y:
# skip for the true class to only loop over incorrect classes
continue
# accumulate loss for the i-th example
loss_i += max(0, scores[j] - correct_class_score + delta)
return loss_i
def L_i_vectorized(x, y, W):
"""
A faster half-vectorized implementation. half-vectorized
refers to the fact that for a single example the implementation contains
no for loops, but there is still one loop over the examples (outside this function)
"""
delta = 1.0
scores = W.dot(x)
# compute the margins for all classes in one vector operation
margins = np.maximum(0, scores - scores[y] + delta)
# on y-th position scores[y] - scores[y] canceled and gave delta. We want
# to ignore the y-th position and only consider margin on max wrong class
margins[y] = 0
loss_i = np.sum(margins)
return loss_i
def L(X, y, W):
"""
fully-vectorized implementation :
- X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
- y is array of integers specifying correct class (e.g. 50,000-D array)
- W are weights (e.g. 10 x 3073)
"""
# evaluate loss over all examples in X without using any for loops
# left as exercise to reader in the assignment
Softmax classifier
softmax function:
f
j
(
z
)
=
e
z
j
∑
k
e
z
k
f_j(z) = \frac{e^{z_j}}{\sum_k e^{z_k}}
fj(z)=∑kezkezj
L
i
=
−
log
(
e
f
y
i
∑
j
e
f
j
)
or equivalently
L
i
=
−
f
y
i
+
log
∑
j
e
f
j
L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{or equivalently} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j}
Li=−log(∑jefjefyi)or equivalentlyLi=−fyi+logj∑efj
信息熵:
H
(
p
,
q
)
=
−
∑
x
p
(
x
)
log
q
(
x
)
H(p,q) = - \sum_x p(x) \log q(x)
H(p,q)=−x∑p(x)logq(x)
Code
f = np.array([123, 456, 789]) # example with 3 classes and each having large scores
p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup
# instead: first shift the values of f so that the highest number is 0:
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer
SVM vs Softmax
Interactive Web Demo of Linear Classification
Summary
- 定义了一个score function,表示图像像素到分类器中的映射(本文中用了线性函数)
- 与KNN分类不同,这种方法确定了参数后即可以丢弃训练集,对可以快速预测新图像。
- 引入了线性分类器常用的两种损失函数:SVM和Softmax。损失函数越小,对训练的预测就越好。