Cs231n作业：Q1-2 Support Vector Machine（SVM）

最新推荐文章于 2021-07-11 22:21:06 发布

一位以泪洗面的同学

最新推荐文章于 2021-07-11 22:21:06 发布

阅读量916

点赞数 1

分类专栏： Cs231n作业

本文链接：https://blog.csdn.net/qq_37041483/article/details/99082602

版权

Cs231n作业专栏收录该内容

4 篇文章 1 订阅

订阅专栏

Cs231n作业：Q1-2 Support Vector Machine（SVM）

SVM算法
- 步骤
- 例子
Cs231n——SVM作业

SVM算法

略…

步骤

1.定义一个损失函数，量化我们对训练数据中分数的不满意程度。
2.提出一种有效地找出使损失函数最小的参数的方法。

SVM损失：计算了所有不正确的例子，将所有不正确类别的评分与正确类别评分之差再加1，将得到的数值与0比较，取二者最大，然后将所有数值进行求和。

计算分数：
$s = f (x, W) = W x$
计算完全损失（有正则项）：
$L=\frac{ 1}{N}\sum_{i=1}^{N}\sum_{j≠y_i}^{}max(0,f(x_i; W))_j-f(x_i; W)y_i + 1) + λR(W)$
$\sum_{k}\sum_{l}W_k,_l^2$
梯度计算（数值方法）：
由 $L_i=\sum_{j≠y_i}max(0, W_jX_i^T-Wy_iX_i^T+1)，也即L_i=\sum_{j≠y_i}max(0, S_j-Sy_i+1)$ 知，当Li <= 0时，梯度为0，只有当Li > 0其梯度为：
$当j≠y_i : \frac{\partial L_i}{\partial S_j}=X_i^T$ $当j＝y_i : \frac{\partial L_i}{\partial Sy_i}=-X_i^T$
最后计算平均值以及加入正则化。
所以再在下面svm_loss_naive代码中：

if margin > 0: 
	dW[:,y[i]] += -X[i,:]
	dW[:,j] += X[i,:]

dW /= num_train  # 获取平均
dW += reg * W  # 加入正则化

例子

1.计算分数及对应的损失：
在这里插入图片描述
2.计算平均损失：

3.加入正则项

Cs231n——SVM作业

Multiclass Support Vector Machine exercise

在这个练习中，你会:
• 为SVM实现全矢量化的损失函数
• 实现其解析梯度的全矢量表达式
• 使用数值梯度检查实现
• 使用验证集来调整学习速度和正则化强度
• 使用SGD优化损失函数
• 想象最终学习到的重量

CIFAR-10 Data Loading and Preprocessing

这里的数据集用的是CIFAR-10，数据的加载与预处理

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'

# 清理变量以防止多次加载数据(这可能会导致内存问题)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

通过训练数据和测试数据的大小可知：每张图片像素都是32 x 32 x 3，训练集有50000张，测试集有10000张。

输出：

Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

这里展示来自每个类的一些图片例子：

# 从数据集中可视化一些示例。
# 我们展示了来自每个类的一些训练图像的例子。
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

输出：
在这里插入图片描述
为了更有效的执行代码，将数据分割为训练集，验证集和测试集。（把训练集的前49000作为训练集，后1000作为验证集，选测试集中的前1000张作为测试集）。此外，我们将创建一个小的开发集作为训练数据的子集，可以将其用于开发，从而使代码运行得更快：

num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
# 将数据集中最后1000个数据作为验证集
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
# 将数据集前49000个数据作为训练集
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
# 在数据集中从0-49000数字中随机抽取num_dev大小的数据作为开发集，并且不能重用元素
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
# 选测试集中的前1000个作为测试集。
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

输出：
训练集有49000张，验证集有1000张，测试集有1000张。

Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

预处理：将图片数据进行张量变形，重新塑成一行

# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
# 作为一个完整性检查，打印出数据的形状
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)

输出：
以X_train.shape为例：第一维大小为X_train.shape[0]即变为49000 而第二维为-1表示列不知道多少，所以根据剩下纬度进行计算，即32x32x3=3027。所以最终形状为(49000,3272)。
所以，作为一个完整性检查，打印出数据的形状：

Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

进一步预处理：减去图像的均值
首先，根据训练数据计算图像均值

# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)  # 压缩行，对各列进行求均值
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()

输出：

[130.64189796 135.98173469 132.47391837 130.05569388 135.34804082
 131.75402041 130.96055102 136.14328571 132.47636735 131.48467347]

在这里插入图片描述
第二，从训练和测试数据中减去平均图像

# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

第三，增加一维偏置(即偏置技巧)，使得SVM只需要优化一个权值矩阵W。

# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
# 水平堆叠序列中的数组（列方向），即最最后一列后面增加一列np.ones((X_train.shape[0], 1))（作为偏置）
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)

输出：

(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM Classifier

本节的代码将全部在cs231n/classifier /linear_svm.py中编写。
实现简单（带循环）的结构化SVM损失函数：
输入有D维即特征，C个类别，我们操作是的N个小批量列子。
输入：

W：包含权重的形状为(D, C)的数组
X：包含少量数据的形状为(N, D)的数组
y：包含训练标签，形状为(N, )的数组，其中y[i] = c 表示X[i]有标签c，其中 0<= c < C
reg:(float)正则化强度

返回一个元组：

loss as single float
对权值W的梯度;与W形状相同的数组

from builtins import range
import numpy as np
from random import shuffle
from past.builtins import xrange

def svm_loss_naive(W, X, y, reg):
    """
    Structured SVM loss function, naive implementation (with loops).

    Inputs have dimension D, there are C classes, and we operate on minibatches
    of N examples.

    Inputs:
    - W: A numpy array of shape (D, C) containing weights.
    - X: A numpy array of shape (N, D) containing a minibatch of data.
    - y: A numpy array of shape (N,) containing training labels; y[i] = c means
      that X[i] has label c, where 0 <= c < C.
    - reg: (float) regularization strength

    Returns a tuple of:
    - loss as single float
    - gradient with respect to weights W; an array of same shape as W
    """
    # 初始化梯度为0。 (D, C)
    dW = np.zeros(W.shape) # initialize the gradient as zero

    # compute the loss and the gradient 计算损失和梯度
    num_classes = W.shape[1]  # 取对应类别 C
    num_train = X.shape[0]  # 取对应样本数 N
    loss = 0.0
    for i in range(num_train):  # 遍历样本i-N
        scores = X[i].dot(W)  # 分别计算分数向量(1xC)，scores vecotr: s = f(xi,W)
        correct_class_score = scores[y[i]]  # 对应次样本真正标签所对应的分数(1X1)
        for j in range(num_classes):  # 遍历类别j-C
            if j == y[i]:  # 如果当前类别即为本样本标签，则跳过
                continue
            # 否则计算该类别 the SVM loss,注意 delta = 1,j≠y_i的均通过S_j - S_yi + 1 分别进行计算。
            margin = scores[j] - correct_class_score + 1  # 获取对应一个实数
            if margin > 0:  # max函数在括号里≤0时，梯度肯定为0（初始化的值），所以直接看＞0
                loss += margin  # 该样本的损失等于该样本所得到的间隔实数
                # 计算梯度：对W求偏导
                # (X_iW_j - X_iW_yi + 1)对W_yi这列，通过减去X_i
                # 所以从dW中取该类真正标签类别的所有特征[:,y[i]]（此时全为0），使其减去该类别所有特征值
                dW[:,y[i]] += -X[i,:]
                # (X_iW_j - X_iW_yi + 1)对W_j这列，通过加X_i
                # 所以从dW中分别取出非真正标签类别的所有特征[:,j]（此时也全为0），使其加上该类别所有特征值。
                dW[:,j] += X[i,:]

    # Right now the loss is a sum over all training examples, but we want it
    # to be an average instead so we divide by num_train.
    loss /= num_train # 获取该样本的平均损失
    dW /= num_train  # 获取平均

    # Add regularization to the loss.
    loss += reg * np.sum(W * W)  # 加入正则化，得到完整的损失函数
    dW += reg * W  # 加入正则化

    #############################################################################
    # TODO:                                                                     #
    # Compute the gradient of the loss function and store it dW.                #
    # Rather that first computing the loss and then computing the derivative,   #
    # it may be simpler to compute the derivative at the same time that the     #
    # loss is being computed. As a result you may need to modify some of the    #
    # code above to compute the gradient.                                       #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****


    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    return loss, dW

如您所见，我们已经预填充了compute_loss_naive函数，该函数使用for循环来评估多类SVM的损失函数。

# 评估我们为您提供的损失的简单实现
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# 生成一个随机数小的SVM权值矩阵
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))

输出：

loss: 9.112255

推导并实现SVM代价函数的梯度，并在函数svm_loss_naive中内联实现梯度。您会发现在现有函数中插入新代码很有帮助。
要检查是否正确地实现了梯度，可以用数值方法估计损失函数的梯度，并将数值估计与计算的梯度进行比较。我们已经为您提供了这样做的代码:

# 实现梯度之后，使用下面的代码重新计算梯度
# 用我们提供的函数来检查梯度

# 计算损失及其在W处的梯度.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# 沿随机选择的几个维度数值计算梯度
# 将它们与分析计算的梯度进行比较。
# 数字应该匹配几乎沿着所有的维度。
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# 当正则化打开时，是否再次检查梯度

# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)

输入，即检查numeric gradient和analytic gradient是否相同：

numerical: 16.558587 analytic: 16.558587, relative error: 7.044564e-12
numerical: -1.877586 analytic: -1.877586, relative error: 7.754054e-11
numerical: -17.992739 analytic: -17.992739, relative error: 9.499249e-12
numerical: 26.182227 analytic: 26.182227, relative error: 9.835010e-12
numerical: -54.546606 analytic: -54.546606, relative error: 1.438603e-11
numerical: 18.977124 analytic: 18.977124, relative error: 2.583222e-12
numerical: 20.062216 analytic: 20.062216, relative error: 1.051530e-12
numerical: 18.542379 analytic: 18.542379, relative error: 5.079020e-12
numerical: 26.232349 analytic: 26.232349, relative error: 6.182360e-12
numerical: -31.468373 analytic: -31.468373, relative error: 1.159591e-11
numerical: -48.611978 analytic: -48.617267, relative error: 5.440557e-05
numerical: 1.501071 analytic: 1.508248, relative error: 2.385005e-03
numerical: -7.296033 analytic: -7.294829, relative error: 8.252591e-05
numerical: -0.337898 analytic: -0.347751, relative error: 1.437046e-02
numerical: 24.106606 analytic: 24.120138, relative error: 2.806025e-04
numerical: 3.226319 analytic: 3.227376, relative error: 1.636281e-04
numerical: 36.533221 analytic: 36.539594, relative error: 8.722431e-05
numerical: -19.570135 analytic: -19.564052, relative error: 1.554484e-04
numerical: -40.827530 analytic: -40.827961, relative error: 5.273252e-06
numerical: -10.187518 analytic: -10.186827, relative error: 3.389178e-05

Inline Question1

有时，gradcheck中的维度可能并不完全匹配。造成这种差异的原因是什么呢?这是担忧的原因吗?在一维中，梯度检查可能失败的简单例子是什么?如何改变这种情况发生频率的边际效应?提示:SVM的损失函数严格来说不是可微的

Your Answier： 因为SVM的损失函数严格来说不是可微的。

接下来实现svm_loss_vectorized函数;现在只计算损失;我们稍后将实现梯度。

def svm_loss_vectorized(W, X, y, reg):
    """
    Structured SVM loss function, vectorized implementation.

    Inputs and outputs are the same as svm_loss_naive.
    """
    loss = 0.0
    dW = np.zeros(W.shape) # initialize the gradient as zero
    scores = X.dot(W)  # N*C的矩阵
    num_train = X.shape[0]
    
    #############################################################################
    # TODO:                                                                     #
    # Implement a vectorized version of the structured SVM loss, storing the    #
    # result in loss.                                                           #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    #第一个参数表示取行的范围，np.arange(num_train)=500，即取所有行（总共行为500）
    #第二个参数表示取列。
    # 所以就是取0行的多少列，1行的多少列，2行的多少列， 最终得到每张图片，正确标签对应的分数。
    correct_scores = scores[np.arange(num_train),y]  # 1xN
    correct_scores = correct_scores.reshape((num_train, -1))  # Nx1
    margins = np.maximum(0,scores - correct_scores + 1)  # 计算误差 NxC
    margins[range(num_train), y] = 0  # 将label值所在的位置误差置零
    loss+=np.sum(margins)
    loss/=num_train  # 取所有损失记录结果平均值
    loss+=reg*np.sum(W*W)  # 加上正则化
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    #############################################################################
    # TODO:                                                                     #
    # Implement a vectorized version of the gradient for the structured SVM     #
    # loss, storing the result in dW.                                           #
    #                                                                           #
    # Hint: Instead of computing the gradient from scratch, it may be easier    #
    # to reuse some of the intermediate values that you used to compute the     #
    # loss.                                                                     #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    # 将margins>0的项（有误差的项）置为1，没误差的项为0
    margins[margins > 0] = 1  # NxC

    # 没误差的项中有一项为标记项，计算标记项的权重分量对误差也有共享，也需要更新对应的权重分量
    # margins中这个参数就是当前样本结果错误分类的数量
    row_num = -np.sum(margins,1)
    margins[np.arange(num_train), y] = row_num
    
    # X: 200x3073    margins:200x10  -> 10x3072
    dW += np.dot(X.T, margins)  # 3073x10
    dW /= num_train  # 平均权重
    dW += reg * W  # 正则化
 
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    return loss, dW

# 接下来实现svm_loss_vectorized函数;现在只计算损失;我们稍后将实现梯度。
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))

输出：

Naive loss: 9.112255e+00 computed in 0.157111s
Vectorized loss: 9.112255e+00 computed in 0.004004s
difference: -0.000000

# 完成svm_loss_vectorized的实现，并计算梯度的损失函数，以矢量化的方式。

# 初始实现和向量化实现应该匹配，
# 但是矢量化的版本应该会更快。
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# 损失是一个数字，因此比较这两种实现计算的值很容易。
# 另一方面梯度是一个矩阵，所以我们用弗洛贝尼乌斯范数来比较它们。
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('difference: %f' % difference)

输入：

Naive loss and gradient: computed in 0.150105s
Vectorized loss and gradient: computed in 0.004003s
difference: 0.000000

Stochastic Gradient Descent（随机梯度下降）

在一个训练循环内，必要时一直重复这些步骤：
（1）抽取训练样本x和对应目标y组成的数据批量。
（2）在x上运行网络[这一步叫作前向传播]，得到预测值y_pred。
（3）计算网络在这批数据上的损失，用于衡量y_pred和y之间的距离。
（4）更新网络的所有权重，使网络在这批数据上的损失略微下降。
最终得到的网络在训练数据上的损失非常小，即预测值y_pred和预期目标y之间的距离非常小。

基于当前在随机数据批量上的损失，一点一点地对参数进行调节。由于处理的是一个可微函数，你可以计算出它的梯度，从而有效地实现第四步。沿着梯度的反方向更新权重，损失每次都会变小一点。

步骤

1.抽取训练样本x和对应目标y组成的数据批量。
2.在x上运行网络，得到预测值y_pred。
3.计算网络在这批数据上的损失，用于衡量y_pred和y之间的距离。
4.计算损失相对于网络参数的梯度[一次反向传播]。
5.将参数沿着梯度的反方向移动一点，比如W -= step * gradient，从而使这批数据上的损失减小一点。
图文均来自Python深度学习

我们现在有了矢量化的有效的损失表达式，梯度和我们的梯度匹配的数值梯度。因此，我们准备做SGD以减少损失：

在文件linear_classifier中，在函数中实现SGD：

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
              batch_size=200, verbose=False):
        """
        Train this linear classifier using stochastic gradient descent.
        训练这个线性分类器使用随机梯度下降。

        Inputs:
        - X: 包含训练数据的形状(N, D)的numpy数组;有N个每个维度D的训练样本。
        - y: 包含训练标签的形状(N，)的numpy数组;y[i]= c表示X[i]对于c类有标签 0 <= c < C for C classes.
        - learning_rate: (float)用于优化的学习率。
        - reg: (float)正则化强度。
        - num_iters: (整数)优化时要采取的步骤数
        - batch_size: (整数)在每个步骤中使用的训练示例的数量。
        - verbose: (boolean)如果为真，则在优化期间打印进度。

        Outputs:
        包含每次训练迭代时损失函数值的列表。
        """
        num_train, dim = X.shape  # 分别获取样本数量，以及特征数（纬度）
        num_classes = np.max(y) + 1 # 获取类的个数， 假设y取0…K-1，其中K是类的个数
        if self.W is None:
            # 延迟初始化W
            self.W = 0.001 * np.random.randn(dim, num_classes)

        # 运行随机梯度下降来优化W
        loss_history = []
        for it in range(num_iters):  # 遍历，(整数)优化时要采取的步骤数
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO:                                                                 #
            # 从训练数据及其对应的标签中提取batch_size元素样本，用于这一轮梯度下降       #
            # 将数据存储在X_batch中，相应的标签存储在y_batch中;                        #
            # 采样后X_batch应该有shape (batch_size, dim)，                           #
            # y_batch应该有shape (batch_size，)                                      #
            # Hint: Use np.random.choice to generate indices. Sampling with         #
            # replacement is faster than sampling without replacement.              #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            i = np.random.choice(a=num_train, size=batch_size)  # 取num_train中，随机选取大小为batch_size的数据
            X_batch = X[i,:]  # 获取所选取的i个样本，及其对应的特征
            y_batch = y[i]  # 获取所选取的i个样本的类标签

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            # 评估损失和梯度
            loss, grad = self.loss(X_batch, y_batch, reg)
            loss_history.append(loss)

            # 执行参数更新
            #########################################################################
            # TODO:                                                                 #
            # Update the weights using the gradient and the learning rate.          #
            # 使用梯度和学习率更新权重。                                              #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            # 将参数沿着梯度的反方向移动一点，从而使这批数据上的损失减小一点
            # learning_rate 是步长（学习率），grad是梯度
            self.W -= learning_rate*grad

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            if verbose and it % 100 == 0:
                print('iteration %d / %d: loss %f' % (it, num_iters, loss))

        return loss_history

def loss(self, X_batch, y_batch, reg):
        """
        计算损失函数及其导数
        Compute the loss function and its derivative..
        子类将覆盖它
        Subclasses will override this.

        Inputs:
        - X_batch:形状(N, D)的numpy数组，包含N个数据点;每个点都有维数D。
          data points; each point has dimension D.
        - y_batch: 一个形状(N，)的numpy数组，其中包含用于minibatch的标签。
        - reg: (float)正则化强度。

        Returns: A tuple containing:
        - loss as a single float
        - 关于self.W的梯度;与W形状相同的数组
        """
        loss = 0.0  # 初始化为0 float
        dW = np.zeros(self.W.shape)  # 与W形状相同的数组（初始化为0）
        
        # 计算损失:
        num_train = X_batch.shape[0]  # 获取样本范围
        
        scores = X_batch.dot(self.W)
        correct_scores = scores[np.arange(num_train),y_batch]
        margins = np.maximum(0, scores - correct_scores + 1)
        loss += np.sum(margins)  # 获取所有损失记录结果
        loss /=num_train  # 所有记录结果平均值
        loss += reg * np.sum(self.W*self.W)  # 正则化
        
        # 计算梯度：
        margins[margins > 0] = 1
        row_num = -np.sum(margins, 1)
        margins[np.arange(num_train), y] = row_num
        dW += np.dot(X_batch.T, margins)/num_train +reg * self.W

然后使用下面的代码运行它：

# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))

输出：

iteration 0 / 1500: loss 799.173590
iteration 100 / 1500: loss 474.736085
iteration 200 / 1500: loss 288.689859
iteration 300 / 1500: loss 175.735395
iteration 400 / 1500: loss 107.535725
iteration 500 / 1500: loss 67.656875
iteration 600 / 1500: loss 42.015562
iteration 700 / 1500: loss 28.268911
iteration 800 / 1500: loss 18.718019
iteration 900 / 1500: loss 13.551437
iteration 1000 / 1500: loss 10.574625
iteration 1100 / 1500: loss 8.425946
iteration 1200 / 1500: loss 7.439375
iteration 1300 / 1500: loss 6.384190
iteration 1400 / 1500: loss 6.081119
That took 7.483288s

一个有用的调试策略是将损失绘制为迭代数的函数:

# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()

在这里插入图片描述

写LinearSVM的预测功能。

def predict(self, X):
        """
        Use the trained weights of this linear classifier to predict labels for
        data points.
        利用该线性分类器的训练权值预测数据点的标签

        Inputs:
        - X: 包含训练数据的形状(N, D)的numpy数组;每个维度D都有N个训练样本。

        Returns:
        - y_pred: x中数据的预测标签。y_pred是一个长度为N的一维数组，每个元素都是给出预测类的整数。
        """
        y_pred = np.zeros(X.shape[0])  # 初始化
        ###########################################################################
        # TODO:                                                                   #
        # Implement this method. Store the predicted labels in y_pred.            #
        # 将预测的标签存储在y_pred中                                                #
        ###########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # x(N,D)  W(D,C)
        scores = X.dot(self.W)  # (N , C)
        y_pred = np.argmax(scores,axis=1)  # 获取得到预测的类（1，N）
        

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return y_pred

写好预测功能后，评估性能训练集和验证集的性能

# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))

输出：

training accuracy: 0.382551
validation accuracy: 0.386000

使用验证集来调优超参数(正则化强度和学习率)。

您应该尝试不同的学习速率和正则化强度范围;如果您小心的话，您应该能够在验证集上获得大约0.39的分类精度。

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.39 on the validation set.

#Note: you may see runtime/overflow warnings during hyper-parameter search. 
# This may be caused by extreme values, and is not a bug.
# 注意:在超参数搜索期间，您可能会看到运行时/溢出警告。这可能是由极值引起的，而不是一个bug。
learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.

# 结果是字典将表单元组(learning_rate、regularization_strength)映射为
# 表单元组(training_accuracy、validation_accuracy)。
# 精度只是正确分类的数据点的比例。
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
# 通过调整验证集来选择最佳超参数。对于每个超参数组合，在训练集上训练一个线性SVM，
# 在训练集和验证集上计算其精度，并将这些数字存储在结果字典中。
# 此外，将最佳验证精度存储在best_val中，而在best_svm中存储实现此精度的线性svm对象。

# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
# 在开发验证代码时，应该为num_iter使用一个小值，这样SVMs就不会花费太多时间进行培训;
# 一旦您确信您的验证代码可以工作，您就应该为num_iter重新运行验证代码，并使用更大的值。
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

for learning_rate in learning_rates:
    for regularization_strength in regularization_strengths:
        svm = LinearSVM()  # 对于每个超参数组合，训练一个线性SVM
        loss_history = svm.train(X_train, y_train,
                                 learning_rate=learning_rate,
                                 reg=regularization_strength,
                                 num_iters=1500, verbose=True)
        
        y_train_pred = svm.predict(X_train)  # 这里只是对训练集进行预测而非精度
        train_acc = np.mean(y_train == y_train_pred)  # 在训练集上计算其精度
        
        y_val_pred = svm.predict(X_val)  # 这里只是对验证集进行预测而非精度
        val_acc = np.mean(y_val == y_val_pred)  # 在验证集上计算其精度
        
        if val_acc > best_val:  # 最佳验证精度存储在best_val中
            best_val = val_acc
            best_svm = svm  # 同时获取实现此精度的线性svm对象
            
        # results是将元组(learning_rate, regularization_strength)映射为
        # (training_accuracy, validation_accuracy)的字典
        results[(learning_rate, regularization_strength)] = [train_acc, val_acc]
        

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

这里result字典的赋值卡壳了，参照别人的写的，写法请引起注意，找错找了半天…

输出：

...
lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.380755 val accuracy: 0.377000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.365347 val accuracy: 0.366000
lr 5.000000e-05 reg 2.500000e+04 train accuracy: 0.169265 val accuracy: 0.181000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.055898 val accuracy: 0.048000
best validation accuracy achieved during cross-validation: 0.377000

可视化交叉验证结果：

# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()

输出：
在这里插入图片描述
评估测试集上的最佳svm：

# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)

输出：

linear SVM on raw pixels final test set accuracy: 0.370000

将每节课所学的权重形象化：

# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
# 根据你对学习速度和正则化强度的选择，这些可能好看，也可能不好看。
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
      
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])