cs231n Lecture 2

最新推荐文章于 2019-12-20 23:40:56 发布

一次人间也匆忙

最新推荐文章于 2019-12-20 23:40:56 发布

阅读量1k

点赞数

分类专栏： cs231n 2015-2016 winter 课程学习

本文链接：https://blog.csdn.net/u014421266/article/details/50917108

版权

cs231n 2015-2016 winter 课程学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

说明

除特殊说明，本文以及这个系列文章中的所有插图都来自斯坦福cs231n课程。
转载请注明出处；也请加上这个链接http://vision.stanford.edu/teaching/cs231n/syllabus.html
Feel free to contact me or leave a comment.

Abstract

这节课中，我们会介绍一个简单的图像分类流程，简单介绍NN，KNN，hyperparameter，交叉验证等知识，同时，我们会定义一个线性判别函数（linear score function）。后续课程中，会讲到损失函数（loss function），优化过程（Optimization）以及卷积神经网络。

Image Classification pipeline

Image Classification: a core task in computer vision.

图像分类是计算机视觉领域的一个核心任务，我们做检测，定位等任务时就会发现，这些任务都是以图像分类为基础。
而图像分类，就是给定一系列标记，对每张要预测的图片给以正确的标记划分。

Challenges:

和人的观察不同，对计算机来说，任何输入图像都是数字矩阵，怎么从这些个数字矩阵中分析出它是什么是极有挑战性的。就从图像本身来说，它的难点有：

Viewpoint Variation（同一只猫，左侧脸、右侧脸、正脸）
Illumination（白天和黑夜）
Deformation（站着，坐着，跑着，倒立着）
Occlusion
Background clutter
Intraclass variation（白猫、黑猫、花猫）

Data-driven approach

Attempts have been made

No obvious way to hard-code the algorithm for recognizing a cat, or other classes.

与“排序问题”不同，对于图像分类问题，没有一个既定的方法告诉你怎么做。
早年图像都是low resolution，所以在检测时都是从当前图像着手，从边缘这些分析，发现有类似耳朵、下巴等这些形状就认为是猫（人）。

attempts

Data-driven approach

现在的做法大都是Data-driven的，从大量已知的图片入手（训练集），用学习算法训练一个模型。训练阶段，训练一个从data到label的分类器；测试时，用该分类器对数据进行预测出标记。

Collect a dataset of images and labels
Use Machine Learning to train an image classifier
Evaluate the classifier on a withheld set of test images

First classifier: Nearest Neighbor Classifier

我们先介绍一种最简单的分类器：最近邻分类器。

最近邻分类器在其训练阶段，只需要简单的记住每张训练图片及其对应的类标；测试时，遍历整个训练集找到与当前图像最相似的训练样本，并返回其类标即可。

那我们怎么度量相似性呢？最简单的距离度量方式有：L1 (Manhattan) distance/L2 (Euclidean) distance.

那到底选择哪种距离呢？选择哪种距离，这就是一个hyperparameter，需要我们自己试参数为多少时最好。

那我们应该在测试集上面测试以求得最好的hyperparameters吗？
不是，如果直接用测试集调节参数很容易对测试集过拟合，而丧失模型的泛化能力。我们应该从训练集中划一部分出来做验证集来进调节参数；还可以进行交叉验证，最后进行模型平均。

KNN

我们刚才提到hyperparameter，那如果我们把最近邻换成K近邻，也就是从前K个最相似的类标中投票时，这个K也就变成了hyperparameter.

然而，k-Nearest Neighbor on images never used. 原因很简单：

1 terrible performance at test time

2 distance metrics on level of whole images can be very unintuitive

比如下面这张图中靠右的三张图与原图差别很大，却有着一样的欧氏距离。

k-NearestNeverUsed

Q: How does the classification speed depend on the size of the training data?
Linearly

Q: what is the accuracy of the nearest neighbor classifier on the training data, when using the Euclidean/Manhattan distance?
Zero

Q: what is the accuracy of the k-nearest neighbor classifier on the training data?
Problem-dependent.

Q: What is the best distance to use? What is the best value of k to use? How do we set the hyperparameters?
Very problem-dependent. Must try them all out and see what works best.

Q: Try out what hyperparameters work best on test set?
Very bad idea. The test set is a proxy for the generalization performance! Use only VERY SPARINGLY, at the end.

Summary

Image Classification: We are given a Training Set of labeled images, asked to predict labels on Test Set. Common to report the Accuracy of predictions (fraction of correctly predicted images)
We introduced the k-Nearest Neighbor Classifier, which predicts the labels based on nearest images in the training set
We saw that the choice of distance and the value of k are hyperparameters that are tuned using a validation set, or through cross-validation if the size of the data is small.
Once the best set of hyperparameters is chosen, the classifier is evaluated once on the test set, and reported as the performance of kNN on that data.

Linear Classification

task specific view? model based view? 这里没有听懂

Parametric approach

为了分类，我们定义一个函数f(x,W)，x是image，W是parameters，输入的f是一个得分score向量，如果该向量中0号元素得分最高，那么该线性分类器认为这个图片是0号元素对应的那个类。

linearParameters

实际过程中还有一个偏置bias.

Interpreting a Linear Classifier

Q: what does the linear classifier do, in English?

怎么样解释线性分类器做什么呢？

从CIFAR-10训练了一个线性分类器，将weights reshape成图像大小，如下图所示。我们可以看到plane classifier的weights图显蓝色多一点，也就意味着在R和G通道中，weights多为零或者负数。而在horse classifier的图中，马匹看来有点奇怪，可以猜测是不是CIFAR-10中马头向左的比向右的图像多一点？