CS231n 课程作业 Assignment One（三）SVM分类器（0809）

最新推荐文章于 2021-04-05 20:36:44 发布

阿桥今天吃饱了吗

最新推荐文章于 2021-04-05 20:36:44 发布

阅读量849

点赞数 1

分类专栏：计算机视觉文章标签：神经网络

本文链接：https://blog.csdn.net/yq1271/article/details/107949875

版权

本文详细介绍了SVM支持向量机的原理，包括线性分类器、损失函数及梯度求解，并对比了SVM与KNN的优劣。接着，通过预处理、SGD优化方法，实现了SVM分类器，探讨了不同方法的loss和时间成本。在实践中，通过超参数调整优化模型，最终在原始像素数据上得到0.37的测试集准确率。此外，还讨论了梯度检查中的不匹配可能源于SVM损失函数的非严格可导性，并对SVM权重的可视化进行了分析。

摘要由CSDN通过智能技术生成

SVM支持向量机

•为SVM实现完全矢量化的损失函数
•实现完全矢量化的解析梯度表达式
•使用数字梯度检查您的实施
•使用验证集调整学习率和正则化强度
•使用SGD优化损失功能
•可视化最终学习的权重

一、原理

1.1 线性分类器

线性SVM分类是给每一个样本一个分数，其正确的分数应该比错误的分数大。在实际分类中，为了提高分类器的鲁棒性，我们希望正确的分数比错误的分数大得多一些，其差值为▲
损失函数公式：折叶损失（hinge loss），又称最大边界损失（max-margin loss）
在这里插入图片描述
得分向量公式：

引入正则项：1/2非必要项

1.2 求解损失函数关于权重矩阵的梯度

分情况讨论：j与yi的关系
当 j=yi ：在这里插入图片描述
当 j≠yi：
针对 Li 每一个大于 0 的分量都进行如上计算，遍历所有样本求和，做平均并引入正则项，可以得到dW

1.3 与KNN相比的优劣：

优点：
1.具有学习能力，分类速度快；
2.依据训练样本概率进行分类，具有一定的鲁棒性；

缺点：
1.不适合样本类别交叉或重叠较多的情况；
2.不存在非线性拟合的能力；
3.需要一次性拿到较多样本；

二、SVM分类的实现

2.1 预处理部分

2.1.1 输出训练集、测试集大小形状

print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

输出：

Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

2.1.2 数据集的分割

# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

输出：

Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)

2.1.3 形态变换

# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)

输出：

Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)

2.1.5 减去平均图像

# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()

# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1