CS231n_2020（1）—— 图像分类

最新推荐文章于 2024-07-30 20:57:20 发布

chaney1ee

最新推荐文章于 2024-07-30 20:57:20 发布

阅读量529

点赞数

文章标签： python 视觉

原文链接：https://cs231n.github.io/classification/

版权

斯坦福大学的CS231n，全称卷积神经网络在视觉识别中的应用（Convolutional Neural Networks for Visual Recognition），最近做毕设么，要用deep learning做目标识别，导师给我推荐了这门课程。想着独学学不如众学学，接下来的一段时间应该会持续分享关于CS231n的笔记。

Image Classification

https://cs231n.github.io/classification/

目标
这一节主要是介绍图像分类的问题，即把一组输入的图片分配一个标签。这是计算机视觉的核心，有着广泛的运用。
举例
图像分类的基本步骤
图像分类的任务就是获取表示单个图像的像素数组，并为其分配一个标签。基本步骤如下:

Input：输入即训练集，由n个带有唯一标签的不同图像组成，总共有k类。
Learning：学习的是使用训练集来学习每个分类的特征。我们将此步骤称为训练分类器或学习模型。
Evaluation：用训练好的模型为一组它从未见过的新图像（验证集）预测标签来评价分类器的质量。当然，我们希望预测与真实答案（ground truth）相符。

Nearest Neighbor Classifier

这里，使用了CIFAR-10（https://www.cs.toronto.edu/~kriz/cifar.html）dataset，这个数据集由60000个32像素高和宽的小图像组成。每个图像都被标记为10个类之一（例如“飞机、汽车、鸟等”）。这60000个图像被分割成50000个图像的训练集和10000个图像的测试集。在下图中，是10个类中每个类的10个随机示例图像：
在这里插入图片描述
这里引入L1 distance的概念，即两个像素之差的加和，如下所示：
L1 distance
$\begin{aligned} d_{1}(I_{1}, I_{2}) = \sum_{p}|I_{1}^{p} - I_{2}^{p}| \end{aligned}$

通过L1 distance判断该图像是属于哪个分类的。上代码：

Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072

nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )

import numpy as np

class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in range(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred

除了L1 distance，还有L2 distance也常用来计算图像分类。
L2 distance
$\begin{aligned} d_{2}(I_{1}, I_{2}) = \sqrt{\sum_{p}（I_{1}^{p} - I_{2}^{p}）^2} \end{aligned}$
L1和L2是vector p-norm里最常用的。

k - Nearest Neighbor Classifier

这个比较简单，直接上图了：
在这里插入图片描述

Validation sets for Hyperparameter tuning

k近邻分类器需要k的设置，那么怎么设置k的值是最优的？除了k值，还有距离范数需要选择：L1范数，L2范数，或者还有其他的距离计算方法，比如点积。这些基础模型的选择被称为hyperparameters。
关于这些超参数的选择，可以通过变换不同的值看哪种方法最优，但是不能在测试集里变换这些方法（we cannot use the test set for the purpose of tweaking hyperparameters.）。测试集是机器学习里非常宝贵的资源，只有最后一次才可以用它（Evaluate on the test set only a single time, at the very end.）。
Split your training set into training set and a validation set. Use validation set to tune all hyperparameters. At the end run a single time on the test set and report performance.