CS231 Image Classification 01

Image Classification

This is my note to the course CS231n Stanford Convolutional Neural Network

Computer’ Work

Input an image, and assign one of the label amoung the given labels.

  • The Problem:
  1. Semantic Gap
  2. Viewpoint variation
  3. illumination
  4. Deformation
  5. Occlusion
  6. Intraclass variation

An image classifier

Coding might be difficult

def classify_image(image):
    # Do Some Magic
    return class_label
  • Attmpts


Data-Driven Approach

  1. Collect a dataset of images and labels
  2. Use Machine Learning to train a classifier
  3. Evaluate the classifier on new images
  • First classifier: Nearest Neighbor

Just Memorize all data and labels

def train(images, labels):
    # Machine Learning!
    return model

Predict the label of the most similar training image

def predict(model, test_images):
    # Use model to predict labels
    return test_labels

Example Dataset: CIFAR10


Issues: Although pic may seems visually similar, but still gives lots of errors.

  • Compare func used in it

K nearest Neighbors Method

L1 distance: d 1 ( I 1 , I 2 ) = ∑ p ∣ I 1 p − I 2 p ∣ d_1(I_1,I_2) = \sum\limits_{p} \mid I_1^p - I_2^p \mid d1(I1,I2)=pI1pI2p


Minimize the sum given the most similar pics



What it looks like


  1. Isolated Yellow Point
  2. Noisy of one single point (green into blue)

Use K Nearest Neighbors to Optimize it

A Better Cmp Func
L2(Euclidean) distance: d 1 ( I 1 , I 2 ) = ∑ p ( I 1 p − I 2 p ) 2 d_1(I_1,I_2) = \sqrt{\sum\limits_{p}{(I_1^p - I_2^p)}^2} d1(I1,I2)=p(I1pI2p)2

The L1 Distance depends on the coordinate system, whenever there is a rotate, it would change the L1 Distance, while that won’t happen in the L2 Distance case (simply because it’s a circle)

  • What’s the best value of k
  • What’s the best distance to use? (L1,L2 or anything else)

These things are preset rather than learn automatically from learning process

This is Very problem-dependent, just try!, but How?


Training & Validation process should not mixed with the test data

  • Cross Validation


  • Validation process


using the validation data to choose the best hyperparameters.


Cause we sum the offset, though the differences bettween pics and pics are various, they still got the same L2 distance, which is not so good.

Linear Classification

  • Parametric Model

f ( x , W ) = W x + b f(x,W) = Wx + b f(x,W)=Wx+b

We need f(x,W) to be 10x1 and the x is actually 3072x1, so the W we input may be 10x3072, sometimes we add a bias to balance.



It use a single line to separate the object based on its RGB info

But how can we tell the quality of W ?
(View the next lecture)

  • Problems

Since it’s linear the Problems is obivious.





当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


