Image Classification
This is my note to the course CS231n Stanford Convolutional Neural Network
Computer’ Work
Input an image, and assign one of the label amoung the given labels.
- The Problem:
- Semantic Gap
- Viewpoint variation
- illumination
- Deformation
- Occlusion
- Intraclass variation
An image classifier
Coding might be difficult
def classify_image(image):
# Do Some Magic
return class_label
- Attmpts
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RHi7g2Ab-1604969982593)(https://s1.ax1x.com/2020/11/07/BIMSmD.png)]
Data-Driven Approach
- Collect a dataset of images and labels
- Use Machine Learning to train a classifier
- Evaluate the classifier on new images
- First classifier: Nearest Neighbor
Just Memorize all data and labels
def train(images, labels):
# Machine Learning!
return model
Predict the label of the most similar training image
def predict(model, test_images):
# Use model to predict labels
return test_labels
Example Dataset: CIFAR10
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KkuFQjqz-1604969982596)(https://s1.ax1x.com/2020/11/07/BIM9TH.png)]
Issues: Although pic may seems visually similar, but still gives lots of errors.
- Compare func used in it
K nearest Neighbors Method
L1 distance: d 1 ( I 1 , I 2 ) = ∑ p ∣ I 1 p − I 2 p ∣ d_1(I_1,I_2) = \sum\limits_{p} \mid I_1^p - I_2^p \mid d1(I1,I2)=p∑∣I1p−I2p∣
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FIGWB59F-1604969982598)(https://s1.ax1x.com/2020/11/07/BIKXSx.png)]
Minimize the sum given the most similar pics
BackWards
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FhHc59Kj-1604969982600)(https://s1.ax1x.com/2020/11/07/BIKjl6.png)]
What it looks like
Issues
- Isolated Yellow Point
- Noisy of one single point (green into blue)
Use K Nearest Neighbors to Optimize it
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jW2WheUX-1604969982605)(https://s1.ax1x.com/2020/11/07/BIMitA.png)]
A Better Cmp Func
L2(Euclidean) distance:
d
1
(
I
1
,
I
2
)
=
∑
p
(
I
1
p
−
I
2
p
)
2
d_1(I_1,I_2) = \sqrt{\sum\limits_{p}{(I_1^p - I_2^p)}^2}
d1(I1,I2)=p∑(I1p−I2p)2
The L1 Distance depends on the coordinate system, whenever there is a rotate, it would change the L1 Distance, while that won’t happen in the L2 Distance case (simply because it’s a circle)
Hyperparameters
- What’s the best value of k
- What’s the best distance to use? (L1,L2 or anything else)
These things are preset rather than learn automatically from learning process
This is Very problem-dependent, just try!, but How?
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KUQVXZAE-1604969982611)(https://s1.ax1x.com/2020/11/07/BIME1P.png)]
Training & Validation process should not mixed with the test data
- Cross Validation
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JmtieLHv-1604969982612)(https://s1.ax1x.com/2020/11/07/BIMApt.png)]
- Validation process
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Sr9uDz4v-1604969982613)(https://s1.ax1x.com/2020/11/07/BIMV6f.png)]
using the validation data to choose the best hyperparameters.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vVAjURyj-1604969982615)(https://s1.ax1x.com/2020/11/07/BIMu7Q.png)]
Cause we sum the offset, though the differences bettween pics and pics are various, they still got the same L2 distance, which is not so good.
Linear Classification
- Parametric Model
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-t5G8Q8yw-1604969982615)(https://s1.ax1x.com/2020/11/07/BIMZX8.png)]
f ( x , W ) = W x + b f(x,W) = Wx + b f(x,W)=Wx+b
We need f(x,W) to be 10x1 and the x is actually 3072x1, so the W we input may be 10x3072, sometimes we add a bias to balance.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8ykRdhXp-1604969982616)(https://s1.ax1x.com/2020/11/07/BIMn0g.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yRiOW4Zt-1604969982617)(https://s1.ax1x.com/2020/11/07/BIMMkj.png)]
It use a single line to separate the object based on its RGB info
But how can we tell the quality of W ?
(View the next lecture)
- Problems
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fi9ccsMK-1604969982618)(https://s1.ax1x.com/2020/11/07/BIMQts.png)]
Since it’s linear the Problems is obivious.