# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive')
# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment1/'
FOLDERNAME = None
assert FOLDERNAME is not None, "[!] Enter the foldername."
# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))
# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_datasets.sh
%cd /content/drive/My\ Drive/$FOLDERNAME
Image features exercise
Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.
We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.
All of your work for this exercise will be done in this notebook.
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
Load data
Similar to previous exercises, we will load CIFAR-10 data from disk.
from cs231n.features import color_histogram_hsv, hog_feature
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
del X_train, y_train
del X_test, y_test
print('Clear previously loaded data.')
except:
pass
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# Subsample the data
mask = list(range(num_training, num_training + num_validation))
X_val = X_train[mask]
y_val = y_train[mask]
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]
mask = list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]
return X_train, y_train, X_val, y_val, X_test, y_test
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
Extract Features
For each image we will compute a Histogram of Oriented
Gradients (HOG) as well as a color histogram using the hue channel in HSV
color space. We form our final feature vector for each image by concatenating
the HOG and color histogram feature vectors.
Roughly speaking, HOG should capture the texture of the image while ignoring
color information, and the color histogram represents the color of the input
image while ignoring texture. As a result, we expect that using both together
ought to work better than using either alone. Verifying this assumption would
be a good thing to try for your own interest.
The hog_feature
and color_histogram_hsv
functions both operate on a single
image and return a feature vector for that image. The extract_features
function takes a set of images and a list of feature functions and evaluates
each feature function on each image, storing the results in a matrix where
each column is the concatenation of all feature vectors for a single image.
from cs231n.features import *
num_color_bins = 10 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)
# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat
# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat
# Preprocessing: Add a bias dimension
X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])
Done extracting features for 1000 / 49000 images
Done extracting features for 2000 / 49000 images
Done extracting features for 3000 / 49000 images
Done extracting features for 4000 / 49000 images
Done extracting features for 5000 / 49000 images
Done extracting features for 6000 / 49000 images
Done extracting features for 7000 / 49000 images
Done extracting features for 8000 / 49000 images
Done extracting features for 9000 / 49000 images
Done extracting features for 10000 / 49000 images
Done extracting features for 11000 / 49000 images
Done extracting features for 12000 / 49000 images
Done extracting features for 13000 / 49000 images
Done extracting features for 14000 / 49000 images
Done extracting features for 15000 / 49000 images
Done extracting features for 16000 / 49000 images
Done extracting features for 17000 / 49000 images
Done extracting features for 18000 / 49000 images
Done extracting features for 19000 / 49000 images
Done extracting features for 20000 / 49000 images
Done extracting features for 21000 / 49000 images
Done extracting features for 22000 / 49000 images
Done extracting features for 23000 / 49000 images
Done extracting features for 24000 / 49000 images
Done extracting features for 25000 / 49000 images
Done extracting features for 26000 / 49000 images
Done extracting features for 27000 / 49000 images
Done extracting features for 28000 / 49000 images
Done extracting features for 29000 / 49000 images
Done extracting features for 30000 / 49000 images
Done extracting features for 31000 / 49000 images
Done extracting features for 32000 / 49000 images
Done extracting features for 33000 / 49000 images
Done extracting features for 34000 / 49000 images
Done extracting features for 35000 / 49000 images
Done extracting features for 36000 / 49000 images
Done extracting features for 37000 / 49000 images
Done extracting features for 38000 / 49000 images
Done extracting features for 39000 / 49000 images
Done extracting features for 40000 / 49000 images
Done extracting features for 41000 / 49000 images
Done extracting features for 42000 / 49000 images
Done extracting features for 43000 / 49000 images
Done extracting features for 44000 / 49000 images
Done extracting features for 45000 / 49000 images
Done extracting features for 46000 / 49000 images
Done extracting features for 47000 / 49000 images
Done extracting features for 48000 / 49000 images
Done extracting features for 49000 / 49000 images
Train SVM on features
Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels.
# Use the validation set to tune the learning rate and regularization strength
from cs231n.classifiers.linear_classifier import LinearSVM
learning_rates = [1e-9, 1e-8, 1e-7]
regularization_strengths = [5e4, 5e5, 5e6]
results = {}
best_val = -1
best_svm = None
################################################################################
# TODO: #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save #
# the best trained classifer in best_svm. You might also want to play #
# with different numbers of bins in the color histogram. If you are careful #
# you should be able to get accuracy of near 0.44 on the validation set. #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
for rate in learning_rates:
for reg in regularization_strengths:
svm = LinearSVM()
loss = svm.train(X_train_feats,y_train,learning_rate = rate,reg = reg,num_iters = 1500,verbose = False)
y_train_pred = svm.predict(X_train_feats)
training_accuracy = np.mean(y_train == y_train_pred)
y_val_pred = svm.predict(X_val_feats)
validation_accuracy = np.mean(y_val == y_val_pred)
if validation_accuracy > best_val:
best_val = validation_accuracy
best_svm = svm
results[(rate,reg)]=(training_accuracy,validation_accuracy)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy))
print('best validation accuracy achieved: %f' % best_val)
lr 1.000000e-09 reg 5.000000e+04 train accuracy: 0.100204 val accuracy: 0.088000
lr 1.000000e-09 reg 5.000000e+05 train accuracy: 0.090612 val accuracy: 0.094000
lr 1.000000e-09 reg 5.000000e+06 train accuracy: 0.177041 val accuracy: 0.184000
lr 1.000000e-08 reg 5.000000e+04 train accuracy: 0.121143 val accuracy: 0.124000
lr 1.000000e-08 reg 5.000000e+05 train accuracy: 0.341898 val accuracy: 0.336000
lr 1.000000e-08 reg 5.000000e+06 train accuracy: 0.414265 val accuracy: 0.423000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.414143 val accuracy: 0.421000
lr 1.000000e-07 reg 5.000000e+05 train accuracy: 0.403653 val accuracy: 0.400000
lr 1.000000e-07 reg 5.000000e+06 train accuracy: 0.353959 val accuracy: 0.370000
best validation accuracy achieved: 0.423000
# Evaluate your trained SVM on the test set: you should be able to get at least 0.40
y_test_pred = best_svm.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print(test_accuracy)
0.422
# An important way to gain intuition about how an algorithm works is to
# visualize the mistakes that it makes. In this visualization, we show examples
# of images that are misclassified by our current system. The first column
# shows images that our system labeled as "plane" but whose true label is
# something other than "plane".
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]
idxs = np.random.choice(idxs, examples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
plt.imshow(X_test[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls_name)
plt.show()
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VpKDgH2Z-1685407577687)(output_10_0.png)]
Inline question 1:
Describe the misclassification results that you see. Do they make sense?
Y o u r A n s w e r : \color{blue}{\textit Your Answer:} YourAnswer:
Neural Network on image features
Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels.
For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy.
# Preprocessing: Remove the bias dimension
# Make sure to run this cell only ONCE
print(X_train_feats.shape)
X_train_feats = X_train_feats[:, :-1]
X_val_feats = X_val_feats[:, :-1]
X_test_feats = X_test_feats[:, :-1]
print(X_train_feats.shape)
(49000, 155)
(49000, 154)
from cs231n.classifiers.fc_net import TwoLayerNet
from cs231n.solver import Solver
input_dim = X_train_feats.shape[1]
hidden_dim = 500
num_classes = 10
data = {
'X_train': X_train_feats,
'y_train': y_train,
'X_val': X_val_feats,
'y_val': y_val,
'X_test': X_test_feats,
'y_test': y_test,
}
net = TwoLayerNet(input_dim, hidden_dim, num_classes)
best_net = None
################################################################################
# TODO: Train a two-layer neural network on image features. You may want to #
# cross-validate various parameters as in previous sections. Store your best #
# model in the best_net variable. #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
solver = Solver(net,data,update_rule='sgd',optim_config={'learning_rate':3e-1,},lr_decay=0.95,num_epochs=10,batch_size=100,print_every=100,verbose=True)
solver.train()
best_net = net
print('Validation accuracy:',solver.best_val_acc)
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
(Iteration 1 / 4900) loss: 2.302549
(Epoch 0 / 10) train acc: 0.099000; val_acc: 0.079000
(Iteration 101 / 4900) loss: 1.470914
(Iteration 201 / 4900) loss: 1.350952
(Iteration 301 / 4900) loss: 1.233826
(Iteration 401 / 4900) loss: 1.454131
(Epoch 1 / 10) train acc: 0.530000; val_acc: 0.519000
(Iteration 501 / 4900) loss: 1.357862
(Iteration 601 / 4900) loss: 1.356539
(Iteration 701 / 4900) loss: 1.377141
(Iteration 801 / 4900) loss: 1.147534
(Iteration 901 / 4900) loss: 1.259492
(Epoch 2 / 10) train acc: 0.558000; val_acc: 0.560000
(Iteration 1001 / 4900) loss: 1.170850
(Iteration 1101 / 4900) loss: 0.994570
(Iteration 1201 / 4900) loss: 1.041930
(Iteration 1301 / 4900) loss: 1.072276
(Iteration 1401 / 4900) loss: 0.824319
(Epoch 3 / 10) train acc: 0.613000; val_acc: 0.573000
(Iteration 1501 / 4900) loss: 0.879039
(Iteration 1601 / 4900) loss: 1.027443
(Iteration 1701 / 4900) loss: 1.168138
(Iteration 1801 / 4900) loss: 0.929035
(Iteration 1901 / 4900) loss: 0.791239
(Epoch 4 / 10) train acc: 0.657000; val_acc: 0.594000
(Iteration 2001 / 4900) loss: 0.864596
(Iteration 2101 / 4900) loss: 0.875569
(Iteration 2201 / 4900) loss: 1.027501
(Iteration 2301 / 4900) loss: 1.188160
(Iteration 2401 / 4900) loss: 0.924593
(Epoch 5 / 10) train acc: 0.694000; val_acc: 0.598000
(Iteration 2501 / 4900) loss: 0.949799
(Iteration 2601 / 4900) loss: 0.771185
(Iteration 2701 / 4900) loss: 0.965558
(Iteration 2801 / 4900) loss: 0.707310
(Iteration 2901 / 4900) loss: 0.964054
(Epoch 6 / 10) train acc: 0.714000; val_acc: 0.573000
(Iteration 3001 / 4900) loss: 0.806378
(Iteration 3101 / 4900) loss: 0.867589
(Iteration 3201 / 4900) loss: 0.828627
(Iteration 3301 / 4900) loss: 0.752515
(Iteration 3401 / 4900) loss: 0.762842
(Epoch 7 / 10) train acc: 0.727000; val_acc: 0.571000
(Iteration 3501 / 4900) loss: 0.782613
(Iteration 3601 / 4900) loss: 0.733044
(Iteration 3701 / 4900) loss: 0.847719
(Iteration 3801 / 4900) loss: 0.679269
(Iteration 3901 / 4900) loss: 0.637614
(Epoch 8 / 10) train acc: 0.739000; val_acc: 0.584000
(Iteration 4001 / 4900) loss: 0.738450
(Iteration 4101 / 4900) loss: 0.636787
(Iteration 4201 / 4900) loss: 0.676468
(Iteration 4301 / 4900) loss: 0.685712
(Iteration 4401 / 4900) loss: 0.726976
(Epoch 9 / 10) train acc: 0.777000; val_acc: 0.600000
(Iteration 4501 / 4900) loss: 0.631218
(Iteration 4601 / 4900) loss: 0.902903
(Iteration 4701 / 4900) loss: 0.778198
(Iteration 4801 / 4900) loss: 0.768288
(Epoch 10 / 10) train acc: 0.795000; val_acc: 0.583000
Validation accuracy: 0.6
# Run your best neural net classifier on the test set. You should be able
# to get more than 55% accuracy.
y_test_pred = np.argmax(best_net.loss(data['X_test']), axis=1)
test_acc = (y_test_pred == data['y_test']).mean()
print(test_acc)
0.58
这个作业比较简单了,就是应用一下前面做过的东西。
作业1终于完成了!!
从这里可见预处理特征在深度学习也是重要的!