2017cs231n assignment1(svm)

最新推荐文章于 2024-01-20 22:04:36 发布

LiuLllDDdd

最新推荐文章于 2024-01-20 22:04:36 发布

阅读量336

点赞数

分类专栏： 2017cs231n

本文链接：https://blog.csdn.net/qq_33891314/article/details/98478227

版权

2017cs231n 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

公式部分

SVM损失函数公式

$\Large L_i = \sum_{j\neq{y_i}} \max(0,s_j-s_{y_i}+1)$
$\Large L(W) = \underbrace{ \frac{1}{N}\sum_{i=1}^N L_i(f(x_i,W),y_i) } + \underbrace{\lambda R(W)}$

正则化

提高模型的泛化能力
L2正则化（权重衰减）
$\Large R(W) = \sum_k\sum_lW_{k,l}^2$

梯度的求导

TODO:

代码部分

svm.ipynb

在这个练习中你将:

完成一个基于SVM的全向量化损失函数
完成解析梯度的全向量化表示
使用数值梯度来验证你的实现
使用一个验证集来优化学习率和正则化强度
使用随机梯度下降法（SGD）来优化
可视化最后学习得到的权重

数据预处理

# 把数据分成训练，验证，测试集。
# 除此之外我们将创建一个开发集作为训练集的子集，我们会使用这个开发集是我们的代码运行的更快。
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# 我们将原始训练集中的num_validation个点的作为验证集
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# 我们将原始训练集中开始的num_train个点的作为训练集
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# 我们还将创建一个开发集，它是训练集的一小部分
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# 我们使用原始测试集中的开始num_test个点作为测试集
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape: (49000, 32, 32, 3)
Train labels shape: (49000,)
Validation data shape: (1000, 32, 32, 3)
Validation labels shape: (1000,)
Test data shape: (1000, 32, 32, 3)
Test labels shape: (1000,)

# 数据预处理：将图片数据形状变为向量
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)

Training data shape: (49000, 3072)
Validation data shape: (1000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)

# 数据预处理: 减去图像的平均值
# first: 基于训练数据计算图像平均值
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # 输出一小部分
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # 可视化图像平均值
plt.show()

# second: 从训练和测试数据减去图像平均值
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

# third:添加一列1作为偏置维度，使我们的SVM在优化时只需要考虑一个权重矩阵W
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)

在这里插入图片描述

linear_svm.py

朴素方法实现SVM损失函数

在这里插入图片描述

def svm_loss_naive(W, X, y, reg):
    """
    使用循环构造SVM损失函数

    输入有维度D,有C类，我们使用N个样本作为一批输入

    Inputs:
    - W: 保存权重的numpy数组，形状为(D, C) 
    - X: 保存一批数据的numpy数组，形状为(N, D)
    - y: 保存训练标签的numpy数组，形状为(N,); y[i] = c 表示X[i]标签为c, 其中 0 <= c < C.
    - reg: (float) 正则化强度

    Returns a tuple of:
    - 一个存储为float的loss
    - 权重W的梯度，和W大小相同的array
    """
    dW = np.zeros(W.shape) # 初始化梯度为0

    # 计算损失和梯度
    num_classes = W.shape[1]
    num_train = X.shape[0]
    loss = 0.0
    for i in range(num_train):
        scores = X[i].dot(W)
        correct_class_score = scores[y[i]]
        for j in range(num_classes):
            if j == y[i]:
                continue
            margin = scores[j] - correct_class_score + 1 # 记住 delta = 1
            if margin > 0:
                loss += margin
                dW[:, y[i]] += -X[i, :].T    
                dW[:, j] += X[i, :].T

    # 现在loss值是所有训练样例loss的总数，现在我们想要通过除以num_train来求平均值
    loss /= num_train
    dW /= num_train

    # 给loss添加正则项
    loss += reg * np.sum(W * W)

    # 计算损失函数的梯度并存储在dW中。
    # 相比较第一次那样计算loss然后计算导数，在相同时间内它可能更快的计算出loss导数
    # loss正在被计算的时候。你可能需要修改上面的一些代码来计算梯度。
    dW += reg * W

    return loss, dW

svm.ipynb

检查计算的loss和梯度

# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# 产生一个数字比较小的随机SVM权重矩阵
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))

loss: 9.190614

上面函数返回的梯度现在都为零。推导并实现SVM损失函数的梯度，并在函数svm_loss_naive内部实现。您会发现将新代码交错到现有函数中很有帮助。

为了检查是否正确实现了梯度，可以用数字估计损失函数的梯度，并将数值估计与计算的梯度进行比较。我们为您提供了这样做的代码：

# 你已经实现了梯度，用下面的代码重新计算它并且使用我们提供给你的函数验证梯度
# 计算loss和它在W中的梯度
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# 对随机选的几个维度计算数值梯度，并把它和你计算的解析梯度比较．所有维度应该几乎相等．
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# 再次验证梯度．这次使用正则项．
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)

numerical: 19.849072 analytic: 19.849072, relative error: 5.520298e-12
numerical: -1.504892 analytic: -1.504892, relative error: 6.562606e-11
numerical: 11.812403 analytic: 11.812403, relative error: 8.660196e-12
numerical: -63.704597 analytic: -63.704597, relative error: 1.820453e-12
numerical: -5.407408 analytic: -5.407408, relative error: 4.599594e-11
numerical: -36.508291 analytic: -36.508291, relative error: 2.907698e-12
numerical: 23.779592 analytic: 23.779592, relative error: 1.577011e-11
numerical: 29.353032 analytic: 29.353032, relative error: 1.772875e-11
numerical: 12.224395 analytic: 12.180846, relative error: 1.784436e-03
numerical: 2.890912 analytic: 2.890912, relative error: 4.758505e-11
numerical: -44.356949 analytic: -44.356920, relative error: 3.171135e-07
numerical: 0.149256 analytic: 0.141826, relative error: 2.552577e-02
numerical: 11.976893 analytic: 11.972968, relative error: 1.638863e-04
numerical: -11.664448 analytic: -11.667394, relative error: 1.262617e-04
numerical: 10.137097 analytic: 10.133080, relative error: 1.981414e-04
numerical: -19.628820 analytic: -19.624164, relative error: 1.186037e-04
numerical: 20.765163 analytic: 20.723335, relative error: 1.008178e-03
numerical: 5.783737 analytic: 5.785277, relative error: 1.330772e-04
numerical: -29.571995 analytic: -29.667418, relative error: 1.610805e-03
numerical: -2.112462 analytic: -2.107824, relative error: 1.099038e-03

linear_svm.py

使用全向量的方法实现SVM损失函数

def svm_loss_vectorized(W, X, y, reg):
    """
    构造一个SVM损失函数,全向量化实现
    """
    loss = 0.0
    dW = np.zeros(W.shape) # 梯度初始化为0

    # 使用向量化的方法求loss
    scores = X.dot(W)        
    num_classes = W.shape[1]
    num_train = X.shape[0]
 
    scores_correct = scores[np.arange(num_train), y] 
    scores_correct = np.reshape(scores_correct, (num_train, -1)) 
    margins = scores - scores_correct + 1
    margins = np.maximum(0,margins)
    margins[np.arange(num_train), y] = 0
    loss += np.sum(margins) / num_train
    loss += 0.5 * reg * np.sum(W * W)
   
    margins[margins > 0] = 1
    row_sum = np.sum(margins, axis=1)                  # 1 by N
    margins[np.arange(num_train), y] = -row_sum        
    dW += np.dot(X.T, margins)/num_train + reg * W     # D by C

    return loss, dW

svm.ipynb

比较两种方法得出的结果是否相同，以及计算所需的时间

# 接下来实现函数svm_loss_vectorized; 
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))

Naive loss: 9.023543e+00 computed in 0.016366s
Vectorized loss: 9.023543e+00 computed in 0.004959s
difference: 0.000000

# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('difference: %f' % difference)

Naive loss and gradient: computed in 0.127969s
Vectorized loss and gradient: computed in 0.003472s
difference: 0.000000

linear_classifier.py

训练线性分类器，使用随机梯度下降（Stochastic Gradient Descent）找到最佳的W最小化损失

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
              batch_size=200, verbose=False):
        """
        使用随机梯度下降法（stochastic gradient descent）来训练这个线性分类器

        Inputs:
        - X: 保存训练数据形状为(N, D)的Numpy数组，这里有N个训练样例，每个形状为D
        - y: 保存训练标签形状为(N,)的numpy数组； y[i] = c意味着对于C类，x[i]的标签0<=c<C。
        - learning_rate: (float) 优化后的学习速率
        - reg: (float) 正则化强度.
        - num_iters: (integer) 优化时跳的步数
        - batch_size: (integer) 在每一步使用训练样例的数量
        - verbose: (boolean) 如果true,输出优化的过程

        Outputs:
        一个列表，保存每一次训练迭代损失函数的值
        """
        num_train, dim = X.shape
        num_classes = np.max(y) + 1 # 假定y取值在0---k-1之间，K是类别数目
        if self.W is None:
            # 延迟初始化W
            self.W = 0.001 * np.random.randn(dim, num_classes)

        # 运行随机梯度下降法来优化W
        loss_history = []
        for it in range(num_iters):
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO:                                                                 #
            # Sample batch_size elements from the training data and their           #
            # corresponding labels to use in this round of gradient descent.        #
            # Store the data in X_batch and their corresponding labels in           #
            # y_batch; after sampling X_batch should have shape (batch_size, dim)   #
            # and y_batch should have shape (batch_size,)                           #
            #                                                                       #
            # Hint: Use np.random.choice to generate indices. Sampling with         #
            # replacement is faster than sampling without replacement.              #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            batch_inx = np.random.choice(num_train, batch_size)
			X_batch = X[batch_inx,:]
			y_batch = y[batch_inx]

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            # evaluate loss and gradient
            loss, grad = self.loss(X_batch, y_batch, reg)
            loss_history.append(loss)

            # perform parameter update
            #########################################################################
            # TODO:                                                                 #
            # Update the weights using the gradient and the learning rate.          #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            self.W = self.W - learning_rate * grad

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            if verbose and it % 100 == 0:
                print('iteration %d / %d: loss %f' % (it, num_iters, loss))

        return loss_history

svm.ipynb

查看优化结果

# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))

iteration 0 / 1500: loss 416.840860
iteration 100 / 1500: loss 240.456190
iteration 200 / 1500: loss 145.978587
iteration 300 / 1500: loss 89.814848
iteration 400 / 1500: loss 56.253875
iteration 500 / 1500: loss 35.668818
iteration 600 / 1500: loss 23.341002
iteration 700 / 1500: loss 16.464000
iteration 800 / 1500: loss 11.249742
iteration 900 / 1500: loss 8.815990
iteration 1000 / 1500: loss 7.325125
iteration 1100 / 1500: loss 6.763167
iteration 1200 / 1500: loss 6.112349
iteration 1300 / 1500: loss 5.843079
iteration 1400 / 1500: loss 5.215074
That took 6.863122s

# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()

在这里插入图片描述

linear_classifier.py

预测图片标签

    def predict(self, X):
        """
        Use the trained weights of this linear classifier to predict labels for
        data points.

        Inputs:
        - X: A numpy array of shape (N, D) containing training data; there are N
          training samples each of dimension D.

        Returns:
        - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
          array of length N, and each element is an integer giving the predicted
          class.
        """
        y_pred = np.zeros(X.shape[0])
        ###########################################################################
        # TODO:                                                                   #
        # Implement this method. Store the predicted labels in y_pred.            #
        ###########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        score = X.dot(self.W)
        y_pred = np.argmax(score,axis=1)
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return y_pred

svm.ipynb

# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))

training accuracy: 0.382776
validation accuracy: 0.384000

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.39 on the validation set.

#Note: you may see runtime/overflow warnings during hyper-parameter search. 
# This may be caused by extreme values, and is not a bug.

learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

for rate in learning_rates:
    for regular in regularization_strengths:
        svm = LinearSVM()
        svm.train(X_train, y_train, learning_rate=rate, reg=regular,
                      num_iters=1000)
        y_train_pred = svm.predict(X_train)
        accuracy_train = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val)
        accuracy_val = np.mean(y_val == y_val_pred)
        results[(rate, regular)]=(accuracy_train, accuracy_val)
        if (best_val < accuracy_val):
            best_val = accuracy_val
            best_svm = svm
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.372857 val accuracy: 0.395000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.370347 val accuracy: 0.387000
lr 5.000000e-05 reg 2.500000e+04 train accuracy: 0.156204 val accuracy: 0.168000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.052796 val accuracy: 0.054000
best validation accuracy achieved during cross-validation: 0.395000

# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()

在这里插入图片描述

# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)

linear SVM on raw pixels final test set accuracy: 0.357000

# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
      
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

在这里插入图片描述

LiuLllDDdd

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
2017cs231n assignment1(svm)

svm.ipynb在这个练习中你将:完成一个基于SVM的全向量化损失函数完成解析梯度的全向量化表示使用数值梯度来验证你的实现使用一个验证集来优化学习率和正则化强度使用随机梯度下降法（SGD）来优化implement a fully-vectorized loss function for the SVMimplement the fully-vectorized expres...
复制链接

扫一扫

专栏目录