cs231n assignment(一) svm线性分类器
这是我自己对所学内容的总结与思考,如有错误,欢迎大家指正,若有侵权,烦请告知
一、预备知识
- 线性分类(实质上类似于模板匹配)——f(x,W)=Wx+b
在本实验中,输入的图像都是32×32×3(=3072),所有图像均属于10个类别中的一个;f(x,W)是10×1的列向量,表示x图像在这10类中的得分;W为10×3072的矩阵;x是3072×1的矩阵,是将一幅图像所有像素展开的结果;b是10×1的常数向量(偏置项),它不与训练数据交互,只是提供给我们一些数据独立的偏好值 - svm损失函数
其中,x为输入的图像,y是它的类别
f(xi,W)是预测出的xi的类别
S是每个类别预测出的分数,其中Syi表示真实类别的得分,Sj表示其他类别的得分,这里的1应该表示为△,表示一个安全边界(可以是其他值),如下图所示,横轴是训练样本真实分类的分数,纵轴是损失,可以看出随着真实分类分数的提高,损失会线性下降,直到超过一个阈值,损失降为0,因为这时我们已经完成了正确分类
- 关于损失函数的几个问题
Q1:SVM损失函数的最小值是0,最大值是无穷大
Q2: 初始化W时,如果每个值都很小,而且所有的s都接近于0 ,则损失函数Li 接近于n-1(n为分类数),这可以作为debug的观测条件
Q3: 如果把所有类别都求和(包括j=yi),损失函数会加1(因为j=yi时,取1)
Q4:如果使用求平均来代替求和,对损失函数并没有什么影响,只是等比例扩大或缩小了几个倍数
Q5:如果在max项上加一个平方,就成了另一种非线性的度量标准,它会将错误的分类放大到平方倍,因此,到底要不要加平方,是在针对具体问题时应该考虑的
- 损失函数的正则化
之前的损失函数只关注了训练集,可能会造成过拟合,但是我们真正的目的是让他更好地预测测试集
其中,λR(W)为正则化,它期望选取一个最简约的模型,λ为超参数,可以用交叉验证来确定,R(W)通常有以下几种形式
这里,我们举一个例子,W1[1,0,0,0],W2[0.25,0.25,0.25,0.25],它们与x点乘的结果是一样的;
若使用L2正则化,我们更倾向于W2,因为W2中每个元素的权重几乎相等,而不是像W1一样,将注意力更多地集中在某个或某些特定元素上;
若使用L1正则化,可能会更加倾向于W1(在这个例子中,两者加起来都是1,但L1背后的含义可能是0越多越简约或者说L1倾向于稀疏矩阵,);
因此,L1和L2度量是否简约的标准不一样。 - 梯度计算
我感觉在这部分作业中,难点在于是梯度的计算,一直搞不清楚
如果想更多的了解梯度的推导过程,可以参见SVM_LOSS梯度推导
二、实验代码
打开cs231n/classifiers/linear_svm.py
from builtins import range
import numpy as np
from random import shuffle
from past.builtins import xrange
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W)
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
dW[:,y[i]]+=-X[i,:]
dW[:,j]+=X[i,:]
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
dW/=num_train
# Add regularization to the loss.
loss += 0.5*reg * np.sum(W * W)
dW+=reg*W
#############################################################################
# TODO: #
# Compute the gradient of the loss function and store it dW. #
# Rather than first computing the loss and then computing the derivative, #
# it may be simpler to compute the derivative at the same time that the #
# loss is being computed. As a result you may need to modify some of the #
# code above to compute the gradient. #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return loss, dW
def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation.
Inputs and outputs are the same as svm_loss_naive.
"""
loss = 0.0
dW = np.zeros(W.shape) # initialize the gradient as zero
#############################################################################
# TODO: #
# Implement a vectorized version of the structured SVM loss, storing the #
# result in loss. #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
num_train = X.shape[0]
num_classes = W.shape[1]
scores = X.dot(W)
correct_class_score = scores[np.arange(num_train), y]
correct_class_score = np.reshape(correct_class_score, (num_train, -1))
margins = scores - correct_class_score + 1
margins = np.maximum(0, margins)
margins[np.arange(num_train), y] = 0
loss += np.sum(margins) / num_train
loss += 0.5*reg * np.sum(W * W)
# pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#############################################################################
# TODO: #
# Implement a vectorized version of the gradient for the structured SVM #
# loss, storing the result in dW. #
# #
# Hint: Instead of computing the gradient from scratch, it may be easier #
# to reuse some of the intermediate values that you used to compute the #
# loss. #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
margins[margins > 0] = 1
row_sum = np.sum(margins, axis=1)
margins[np.arange(num_train), y] = -row_sum.T
dW = np.dot(X.T, margins)
dW /= num_train
dW += reg * W
# pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return loss, dW
打开 cs231n/classifiers/linear_classifier.py
from __future__ import print_function
from builtins import range
from builtins import object
import numpy as np
from cs231n.classifiers.linear_svm import *
from cs231n.classifiers.softmax import *
from past.builtins import xrange
class LinearClassifier(object):
def __init__(self):
self.W = None
def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
batch_size=200, verbose=False):
"""
Train this linear classifier using stochastic gradient descent.
Inputs:
- X: A numpy array of shape (N, D) containing training data; there are N
training samples each of dimension D.
- y: A numpy array of shape (N,) containing training labels; y[i] = c
means that X[i] has label 0 <= c < C for C classes.
- learning_rate: (float) learning rate for optimization.
- reg: (float) regularization strength.
- num_iters: (integer) number of steps to take when optimizing
- batch_size: (integer) number of training examples to use at each step.
- verbose: (boolean) If true, print progress during optimization.
Outputs:
A list containing the value of the loss function at each training iteration.
"""
num_train, dim = X.shape
num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
if self.W is None:
# lazily initialize W
self.W = 0.001 * np.random.randn(dim, num_classes)
# Run stochastic gradient descent to optimize W
loss_history = []
for it in range(num_iters):
X_batch = None
y_batch = None
#########################################################################
# TODO: #
# Sample batch_size elements from the training data and their #
# corresponding labels to use in this round of gradient descent. #
# Store the data in X_batch and their corresponding labels in #
# y_batch; after sampling X_batch should have shape (batch_size, dim) #
# and y_batch should have shape (batch_size,) #
# #
# Hint: Use np.random.choice to generate indices. Sampling with #
# replacement is faster than sampling without replacement. #
#########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
batch_inx = np.random.choice(num_train, batch_size)
X_batch = X[batch_inx, :]
y_batch = y[batch_inx]
#pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# evaluate loss and gradient
loss, grad = self.loss(X_batch, y_batch, reg)
loss_history.append(loss)
# perform parameter update
#########################################################################
# TODO: #
# Update the weights using the gradient and the learning rate. #
#########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
self.W = self.W - learning_rate * grad
#pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))
return loss_history
def predict(self, X):
"""
Use the trained weights of this linear classifier to predict labels for
data points.
Inputs:
- X: A numpy array of shape (N, D) containing training data; there are N
training samples each of dimension D.
Returns:
- y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
array of length N, and each element is an integer giving the predicted
class.
"""
y_pred = np.zeros(X.shape[0])
###########################################################################
# TODO: #
# Implement this method. Store the predicted labels in y_pred. #
###########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
score = X.dot(self.W)
y_pred = np.argmax(score, axis=1)
#pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
return y_pred
def loss(self, X_batch, y_batch, reg):
"""
Compute the loss function and its derivative.
Subclasses will override this.
Inputs:
- X_batch: A numpy array of shape (N, D) containing a minibatch of N
data points; each point has dimension D.
- y_batch: A numpy array of shape (N,) containing labels for the minibatch.
- reg: (float) regularization strength.
Returns: A tuple containing:
- loss as a single float
- gradient with respect to self.W; an array of the same shape as W
"""
pass
class LinearSVM(LinearClassifier):
""" A subclass that uses the Multiclass SVM loss function """
def loss(self, X_batch, y_batch, reg):
return svm_loss_vectorized(self.W, X_batch, y_batch, reg)
class Softmax(LinearClassifier):
""" A subclass that uses the Softmax + Cross-entropy loss function """
def loss(self, X_batch, y_batch, reg):
return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)
最后,我们可看到,准确率大概达到了37.6%,要优于上一节实现的knn算法。