2017cs231n assignment1(svm)

公式部分

SVM损失函数公式

L i = ∑ j ≠ y i max ⁡ ( 0 , s j − s y i + 1 ) \Large L_i = \sum_{j\neq{y_i}} \max(0,s_j-s_{y_i}+1) Li=j̸=yimax(0,sjsyi+1)
L ( W ) = 1 N ∑ i = 1 N L i ( f ( x i , W ) , y i ) ⎵ + λ R ( W ) ⎵ \Large L(W) = \underbrace{ \frac{1}{N}\sum_{i=1}^N L_i(f(x_i,W),y_i) } + \underbrace{\lambda R(W)} L(W)= N1i=1NLi(f(xi,W),yi)+ λR(W)

正则化

提高模型的泛化能力
L2正则化(权重衰减
R ( W ) = ∑ k ∑ l W k , l 2 \Large R(W) = \sum_k\sum_lW_{k,l}^2 R(W)=klWk,l2

梯度的求导

TODO:

代码部分

svm.ipynb

在这个练习中你将:

  • 完成一个基于SVM的全向量化损失函数
  • 完成解析梯度的全向量化表示
  • 使用数值梯度来验证你的实现
  • 使用一个验证集来优化学习率和正则化强度
  • 使用随机梯度下降法(SGD)来优化
  • 可视化最后学习得到的权重

数据预处理

# 把数据分成训练,验证,测试集。
# 除此之外我们将创建一个开发集作为训练集的子集,我们会使用这个开发集是我们的代码运行的更快。
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# 我们将原始训练集中的num_validation个点的作为验证集
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# 我们将原始训练集中开始的num_train个点的作为训练集
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# 我们还将创建一个开发集,它是训练集的一小部分
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# 我们使用原始测试集中的开始num_test个点作为测试集
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Train data shape: (49000, 32, 32, 3)
Train labels shape: (49000,)
Validation data shape: (1000, 32, 32, 3)
Validation labels shape: (1000,)
Test data shape: (1000, 32, 32, 3)
Test labels shape: (1000,)

# 数据预处理:将图片数据形状变为向量
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

print('Training data shape: ', X_train.shape)
print('Validation data shape: ', X_val.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)

Training data shape: (49000, 3072)
Validation data shape: (1000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)

# 数据预处理: 减去图像的平均值
# first: 基于训练数据计算图像平均值
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # 输出一小部分
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # 可视化图像平均值
plt.show()

# second: 从训练和测试数据减去图像平均值
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

# third:添加一列1作为偏置维度,使我们的SVM在优化时只需要考虑一个权重矩阵W
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)

在这里插入图片描述

linear_svm.py

朴素方法实现SVM损失函数

在这里插入图片描述

def svm_loss_naive(W, X, y, reg):
    """
    使用循环构造SVM损失函数

    输入有维度D,有C类,我们使用N个样本作为一批输入

    Inputs:
    - W: 保存权重的numpy数组,形状为(D, C) 
    - X: 保存一批数据的numpy数组,形状为(N, D)
    - y: 保存训练标签的numpy数组,形状为(N,); y[i] = c 表示X[i]标签为c, 其中 0 <= c < C.
    - reg: (float) 正则化强度

    Returns a tuple of:
    - 一个存储为float的loss
    - 权重W的梯度,和W大小相同的array
    """
    dW = np.zeros(W.shape) # 初始化梯度为0

    # 计算损失和梯度
    num_classes = W.shape[1]
    num_train = X.shape[0]
    loss = 0.0
    for i in range(num_train):
        scores = X[i].dot(W)
        correct_class_score = scores[y[i]]
        for j in range(num_classes):
            if j == y[i]:
                continue
            margin = scores[j] - correct_class_score + 1 # 记住 delta = 1
            if margin > 0:
                loss += margin
                dW[:, y[i]] += -X[i, :].T    
                dW[:, j] += X[i, :].T

    # 现在loss值是所有训练样例loss的总数,现在我们想要通过除以num_train来求平均值
    loss /= num_train
    dW /= num_train

    # 给loss添加正则项
    loss += reg * np.sum(W * W)

    # 计算损失函数的梯度并存储在dW中。
    # 相比较第一次那样计算loss然后计算导数,在相同时间内它可能更快的计算出loss导数
    # loss正在被计算的时候。你可能需要修改上面的一些代码来计算梯度。
    dW += reg * W

    return loss, dW

svm.ipynb

检查计算的loss和梯度

# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# 产生一个数字比较小的随机SVM权重矩阵
W = np.random.randn(3073, 10) * 0.0001 

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))

loss: 9.190614

上面函数返回的梯度现在都为零。推导并实现SVM损失函数的梯度,并在函数svm_loss_naive内部实现。您会发现将新代码交错到现有函数中很有帮助。

为了检查是否正确实现了梯度,可以用数字估计损失函数的梯度,并将数值估计与计算的梯度进行比较。我们为您提供了这样做的代码:

# 你已经实现了梯度,用下面的代码重新计算它并且使用我们提供给你的函数验证梯度
# 计算loss和它在W中的梯度
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# 对随机选的几个维度计算数值梯度,并把它和你计算的解析梯度比较.所有维度应该几乎相等.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# 再次验证梯度.这次使用正则项.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)

numerical: 19.849072 analytic: 19.849072, relative error: 5.520298e-12
numerical: -1.504892 analytic: -1.504892, relative error: 6.562606e-11
numerical: 11.812403 analytic: 11.812403, relative error: 8.660196e-12
numerical: -63.704597 analytic: -63.704597, relative error: 1.820453e-12
numerical: -5.407408 analytic: -5.407408, relative error: 4.599594e-11
numerical: -36.508291 analytic: -36.508291, relative error: 2.907698e-12
numerical: 23.779592 analytic: 23.779592, relative error: 1.577011e-11
numerical: 29.353032 analytic: 29.353032, relative error: 1.772875e-11
numerical: 12.224395 analytic: 12.180846, relative error: 1.784436e-03
numerical: 2.890912 analytic: 2.890912, relative error: 4.758505e-11
numerical: -44.356949 analytic: -44.356920, relative error: 3.171135e-07
numerical: 0.149256 analytic: 0.141826, relative error: 2.552577e-02
numerical: 11.976893 analytic: 11.972968, relative error: 1.638863e-04
numerical: -11.664448 analytic: -11.667394, relative error: 1.262617e-04
numerical: 10.137097 analytic: 10.133080, relative error: 1.981414e-04
numerical: -19.628820 analytic: -19.624164, relative error: 1.186037e-04
numerical: 20.765163 analytic: 20.723335, relative error: 1.008178e-03
numerical: 5.783737 analytic: 5.785277, relative error: 1.330772e-04
numerical: -29.571995 analytic: -29.667418, relative error: 1.610805e-03
numerical: -2.112462 analytic: -2.107824, relative error: 1.099038e-03

linear_svm.py

使用全向量的方法实现SVM损失函数

def svm_loss_vectorized(W, X, y, reg):
    """
    构造一个SVM损失函数,全向量化实现
    """
    loss = 0.0
    dW = np.zeros(W.shape) # 梯度初始化为0

    # 使用向量化的方法求loss
    scores = X.dot(W)        
    num_classes = W.shape[1]
    num_train = X.shape[0]
 
    scores_correct = scores[np.arange(num_train), y] 
    scores_correct = np.reshape(scores_correct, (num_train, -1)) 
    margins = scores - scores_correct + 1
    margins = np.maximum(0,margins)
    margins[np.arange(num_train), y] = 0
    loss += np.sum(margins) / num_train
    loss += 0.5 * reg * np.sum(W * W)
   
    margins[margins > 0] = 1
    row_sum = np.sum(margins, axis=1)                  # 1 by N
    margins[np.arange(num_train), y] = -row_sum        
    dW += np.dot(X.T, margins)/num_train + reg * W     # D by C

    return loss, dW

svm.ipynb

比较两种方法得出的结果是否相同,以及计算所需的时间

# 接下来实现函数svm_loss_vectorized; 
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print('difference: %f' % (loss_naive - loss_vectorized))

Naive loss: 9.023543e+00 computed in 0.016366s
Vectorized loss: 9.023543e+00 computed in 0.004959s
difference: 0.000000

# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('difference: %f' % difference)

Naive loss and gradient: computed in 0.127969s
Vectorized loss and gradient: computed in 0.003472s
difference: 0.000000

linear_classifier.py

训练线性分类器,使用随机梯度下降(Stochastic Gradient Descent)找到最佳的W最小化损失

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
              batch_size=200, verbose=False):
        """
        使用随机梯度下降法(stochastic gradient descent)来训练这个线性分类器

        Inputs:
        - X: 保存训练数据形状为(N, D)的Numpy数组,这里有N个训练样例,每个形状为D
        - y: 保存训练标签形状为(N,)的numpy数组; y[i] = c意味着对于C类,x[i]的标签0<=c<C。
        - learning_rate: (float) 优化后的学习速率
        - reg: (float) 正则化强度.
        - num_iters: (integer) 优化时跳的步数
        - batch_size: (integer) 在每一步使用训练样例的数量
        - verbose: (boolean) 如果true,输出优化的过程

        Outputs:
        一个列表,保存每一次训练迭代损失函数的值
        """
        num_train, dim = X.shape
        num_classes = np.max(y) + 1 # 假定y取值在0---k-1之间,K是类别数目
        if self.W is None:
            # 延迟初始化W
            self.W = 0.001 * np.random.randn(dim, num_classes)

        # 运行随机梯度下降法来优化W
        loss_history = []
        for it in range(num_iters):
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO:                                                                 #
            # Sample batch_size elements from the training data and their           #
            # corresponding labels to use in this round of gradient descent.        #
            # Store the data in X_batch and their corresponding labels in           #
            # y_batch; after sampling X_batch should have shape (batch_size, dim)   #
            # and y_batch should have shape (batch_size,)                           #
            #                                                                       #
            # Hint: Use np.random.choice to generate indices. Sampling with         #
            # replacement is faster than sampling without replacement.              #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            batch_inx = np.random.choice(num_train, batch_size)
			X_batch = X[batch_inx,:]
			y_batch = y[batch_inx]

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            # evaluate loss and gradient
            loss, grad = self.loss(X_batch, y_batch, reg)
            loss_history.append(loss)

            # perform parameter update
            #########################################################################
            # TODO:                                                                 #
            # Update the weights using the gradient and the learning rate.          #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            self.W = self.W - learning_rate * grad

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            if verbose and it % 100 == 0:
                print('iteration %d / %d: loss %f' % (it, num_iters, loss))

        return loss_history

svm.ipynb

查看优化结果

# In the file linear_classifier.py, implement SGD in the function
# LinearClassifier.train() and then run it with the code below.
from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,
                      num_iters=1500, verbose=True)
toc = time.time()
print('That took %fs' % (toc - tic))

iteration 0 / 1500: loss 416.840860
iteration 100 / 1500: loss 240.456190
iteration 200 / 1500: loss 145.978587
iteration 300 / 1500: loss 89.814848
iteration 400 / 1500: loss 56.253875
iteration 500 / 1500: loss 35.668818
iteration 600 / 1500: loss 23.341002
iteration 700 / 1500: loss 16.464000
iteration 800 / 1500: loss 11.249742
iteration 900 / 1500: loss 8.815990
iteration 1000 / 1500: loss 7.325125
iteration 1100 / 1500: loss 6.763167
iteration 1200 / 1500: loss 6.112349
iteration 1300 / 1500: loss 5.843079
iteration 1400 / 1500: loss 5.215074
That took 6.863122s

# A useful debugging strategy is to plot the loss as a function of
# iteration number:
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()

在这里插入图片描述

linear_classifier.py

预测图片标签

    def predict(self, X):
        """
        Use the trained weights of this linear classifier to predict labels for
        data points.

        Inputs:
        - X: A numpy array of shape (N, D) containing training data; there are N
          training samples each of dimension D.

        Returns:
        - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
          array of length N, and each element is an integer giving the predicted
          class.
        """
        y_pred = np.zeros(X.shape[0])
        ###########################################################################
        # TODO:                                                                   #
        # Implement this method. Store the predicted labels in y_pred.            #
        ###########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        score = X.dot(self.W)
        y_pred = np.argmax(score,axis=1)
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return y_pred

svm.ipynb

# Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))

training accuracy: 0.382776
validation accuracy: 0.384000

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.39 on the validation set.

#Note: you may see runtime/overflow warnings during hyper-parameter search. 
# This may be caused by extreme values, and is not a bug.

learning_rates = [1e-7, 5e-5]
regularization_strengths = [2.5e4, 5e4]

# results is dictionary mapping tuples of the form
# (learning_rate, regularization_strength) to tuples of the form
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1   # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.

for rate in learning_rates:
    for regular in regularization_strengths:
        svm = LinearSVM()
        svm.train(X_train, y_train, learning_rate=rate, reg=regular,
                      num_iters=1000)
        y_train_pred = svm.predict(X_train)
        accuracy_train = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val)
        accuracy_val = np.mean(y_val == y_val_pred)
        results[(rate, regular)]=(accuracy_train, accuracy_val)
        if (best_val < accuracy_val):
            best_val = accuracy_val
            best_svm = svm
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.372857 val accuracy: 0.395000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.370347 val accuracy: 0.387000
lr 5.000000e-05 reg 2.500000e+04 train accuracy: 0.156204 val accuracy: 0.168000
lr 5.000000e-05 reg 5.000000e+04 train accuracy: 0.052796 val accuracy: 0.054000
best validation accuracy achieved during cross-validation: 0.395000

# Visualize the cross-validation results
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# plot validation accuracy
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.show()

在这里插入图片描述

# Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)

linear SVM on raw pixels final test set accuracy: 0.357000

# Visualize the learned weights for each class.
# Depending on your choice of learning rate and regularization strength, these may
# or may not be nice to look at.
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
      
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值