Cs231n作业：Q1-3 Softmax

最新推荐文章于 2023-05-27 08:52:12 发布

一位以泪洗面的同学

最新推荐文章于 2023-05-27 08:52:12 发布

阅读量469

点赞数

分类专栏： Cs231n作业

本文链接：https://blog.csdn.net/qq_37041483/article/details/100515696

版权

Cs231n作业专栏收录该内容

4 篇文章 1 订阅

订阅专栏

Cs231n作业：Q1-3 Softmax exercise

Softmax exercise
- 梯度计算——摘自https://blog.csdn.net/yc461515457/article/details/51924604
Softmax Classifier
- lnline Question 1
- lnline Question 2-Ture or False

Softmax exercise

梯度计算——摘自https://blog.csdn.net/yc461515457/article/details/51924604

首先是给出Loss公式：
$\frac{1}{N} \sum_i L_i }+ { \lambda R(W) } ---（1）$
共有N个样本，每个样本带来的Loss是L_i：
$L_i =-\log{p_{y_i}}= -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) = -f_{y_i} + \log\sum_j e^{f_j} ---（2）$
$对于每一个样本X_i，由于softmax的分母对所有的f_j进行了累积求和，$ $所以L_i对W的导数对W的每一列都有贡献，即\frac{\partial{L_i}}{W_j}对所有的j都不为0：$
$y_i时：\frac {\partial{L_i}} {\partial{W_j}} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } \frac{\partial f_j}{\partial W_j} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } X_i^T ---（3）$
$y_i时：\frac {\partial{L_i}} {\partial{W_j}} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } \frac{\partial f_j}{\partial W_j} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } X_i^T - X_i^T---（4）$
这个练习类似于SVM练习。你会:
•为Softmax分类器实现一个全矢量化的损失函数
•实现其解析梯度的全矢量表达式
•检查您的实现与数值梯度
•使用验证集来调整学习速度和正则化强度
•使用SGD优化损失函数
•想象最终学习到的重量

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

获取CIFAR10数据：

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the linear classifier. These are the same steps as we used for the
    SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    
    # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
    try:
       del X_train, y_train
       del X_test, y_test
       print('Clear previously loaded data.')
    except:
       pass

    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # subsample the data(子样品的数据)
    mask = list(range(num_training, num_training + num_validation))  # 49000-50000的1000个
    X_val = X_train[mask]  # 训练集后1000个，用于验证
    y_val = y_train[mask]
    mask = list(range(num_training))  # 49000
    X_train = X_train[mask]  # 训练集前49000个，用于训练
    y_train = y_train[mask]
    mask = list(range(num_test))  # 1000
    X_test = X_test[mask]  # 测试集前1000个，用于测试
    y_test = y_test[mask]
    mask = np.random.choice(num_training, num_dev, replace=False)  # 从训练集0-49000中随机选500个数据作为开发集，并且不能重用元素
    X_dev = X_train[mask]
    y_dev = y_train[mask]
    
    # Preprocessing: reshape the image data into rows（预处理，将数据图像重塑为行）
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
    
    # Normalize the data: subtract the mean image（标准化数据:减去图像的平均值）
    mean_image = np.mean(X_train, axis = 0)  # 训练集每个特征求平均值
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image
    
    # add bias dimension and transform into columns（添加偏差维度并转换为列）
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)

输出：

Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3073)
Test labels shape:  (1000,)
dev data shape:  (500, 3073)
dev labels shape:  (500,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.

打开文件cs231n/classifier /softmax.py并实现softmax_loss_naive函数。

from builtins import range
import numpy as np
from random import shuffle
from past.builtins import xrange

def softmax_loss_naive(W, X, y, reg):
    """
    Softmax损失函数，简单的实现(带有循环)输入有维数D，有C类，我们操作的是小批量N的例子。
    输入:
        - W:一个包含权重的形状(D, C)的数字数组。
        - X:形状(N, D)的数字数组，包含少量数据。
        - y:包含训练标签的形状(N，)的numpy数组;y[i] = c表示
        X[i]有标签c，其中0 <= c < c。
        - reg:(float)正则化强度
    返回一个元组:
        -单浮子损失
        -权值W的梯度;与W形状相同的数组
    """
    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)  # 3073x10

    #############################################################################
    # TODO: 使用显式循环计算软最大值损失及其梯度。                                  #
    # 将损耗存储在损耗中，梯度存储在dW中。如果在这里不小心，很容易遇到数值不稳定。     #
    # 不要忘记正则化!                                                            #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    num_train = X.shape[0]  # 获取样本数量500
    num_class= W.shape[1]  # 获取特征3073
    for i in range(num_train):
    	scores_i = X[i].dot(W)  # 1x3073  * 3073x10 = 1x10
    	exp_scores = np.exp(scores_i)  # e^scores的数组 1x10
    	sum_scores = np.sum(exp_scores)  # 求所有分数的总和 1x1
    	exp_scores /= sum_scores
    	correct_score = exp_scores[y[i]]  # 获取真正标签所对应的分数 1x1
    	loss += -np.log(correct_score)  # 计算该样本的损失
    	for j in range(num_class):
    		if j != y[i]:
    			dW[:,j] += exp_scores[j]*X[i]
    		else:
    			dW[:,j] += (exp_scores[y[i]]-1)*X[i]
    			
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    loss /= num_train  # 获取该样本的平均损失
    dW /= num_train
    loss += reg * np.sum(W * W)  # 加入正则化，得到完整的损失函数
    dW += reg * W
    return loss, dW

# First implement the naive softmax loss function with nested loops.首先用嵌套循环实现朴素的softmax损失函数。
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.
# 打开文件cs231n/classifier /softmax.py并实现
# softmax_loss_naive函数。
from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
# 生成一个随机的软最大权矩阵，并使用它来计算损失。
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
# 作为一个粗略的完整性检查，我们的损失应该接近-log(0.1)。
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))

输出：

loss: 2.354422
sanity check: 2.302585

lnline Question 1

为什么我们期望损失接近-log(0.1)呢?简要解释。
Your Answer:
$由公式：L_i = -\log\left(\frac{e^{s_{y_i}}}{ \sum_j e^{s_j} }\right) 知：$ 因为是随机初始化并且类的总和是10,所以正确预测类数的可能是1/10，那么损失将为-log(0.1)。

# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
# 完成softmax_loss_naive的实现，并实现一个使用嵌套循环的渐变(naive)版本。
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
# 与我们对SVM所做的一样，使用数值梯度检查作为调试工具。数值梯度应接近解析梯度。
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# similar to SVM case, do another gradient check with regularization
# 与SVM的情况类似，用正则化方法再做一次梯度检验
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

与我们对SVM所做的一样，使用数值梯度检查作为调试工具。数值梯度应接近解析梯度。
与SVM的情况类似，用正则化方法再做一次梯度检验：

numerical: 1.895536 analytic: 1.895536, relative error: 4.075329e-08
numerical: 1.145960 analytic: 1.145960, relative error: 7.754185e-09
numerical: 0.520816 analytic: 0.520816, relative error: 1.751680e-09
numerical: 0.019831 analytic: 0.019830, relative error: 1.837633e-06
numerical: -1.927560 analytic: -1.927560, relative error: 3.272616e-09
numerical: -2.331816 analytic: -2.331816, relative error: 1.606302e-08
numerical: 1.795485 analytic: 1.795485, relative error: 4.357714e-08
numerical: 2.946119 analytic: 2.946119, relative error: 2.266958e-08
numerical: 0.776307 analytic: 0.776307, relative error: 3.997289e-09
numerical: 2.914070 analytic: 2.914070, relative error: 6.318577e-09
numerical: -0.553924 analytic: -0.549033, relative error: 4.434225e-03
numerical: 0.000276 analytic: -0.001898, relative error: 1.000000e+00
numerical: 0.949624 analytic: 0.939115, relative error: 5.563964e-03
numerical: 1.651383 analytic: 1.655799, relative error: 1.335457e-03
numerical: -0.736941 analytic: -0.741462, relative error: 3.057579e-03
numerical: 2.944667 analytic: 2.947244, relative error: 4.372780e-04
numerical: 3.311702 analytic: 3.317417, relative error: 8.620659e-04
numerical: -1.625303 analytic: -1.628303, relative error: 9.221845e-04
numerical: 1.819605 analytic: 1.810454, relative error: 2.520883e-03
numerical: 0.428718 analytic: 0.432026, relative error: 3.843106e-03

既然我们已经有了softmax loss函数及其梯度的一个简单实现，那么就用softmax_loss_vectorized实现一个矢量化的版本。这两个版本应该计算相同的结果，但是矢量化版本应该更快。

def softmax_loss_vectorized(W, X, y, reg):
    """
    Softmax loss function, vectorized version.

    Inputs and outputs are the same as softmax_loss_naive.
    """
    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)
    num_train = X.shape[0]
    
    #############################################################################
    # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
    # Store the loss in loss and the gradient in dW. If you are not careful     #
    # here, it is easy to run into numeric instability. Don't forget the        #
    # regularization!                                                           #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    scores = X.dot(W)  # 计算分数矩阵。 500x3073 * 3073x10 = 500x10
    exp_scores = np.exp(scores)  # e^scores.  500x10 
    sum_scores = np.sum(exp_scores, axis = 1)  # 各个类所对应的分数总和。 (500,)
    exp_scores /= sum_scores[:,np.newaxis]  # 标准化后的概率 500x10
    
    loss_matrix = -np.log(exp_scores[range(num_train),y])  # 500,
    loss += np.sum(loss_matrix)
    exp_scores[range(num_train),y] -= 1  # 取正确标签处，减一。注：必须使用range(num_train)而非：
    dW += np.dot(X.T, exp_scores)  # 3073x500 * 500x10 = 3073x10
	
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    loss /= num_train
    loss += reg * np.sum(W*W)
    dW /= num_train
    dW +=reg *W
    
    return loss, dW

# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
# 和SVM一样，我们使用Frobenius范数来比较两个版本的梯度。
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('Loss difference: %f' % np.abs(loss_naive - loss_vectorized))
print('Gradient difference: %f' % grad_difference)

输出：
和SVM一样，我们使用Frobenius范数来比较两个版本的梯度。

naive loss: 2.354422e+00 computed in 0.128090s
vectorized loss: 2.354422e+00 computed in 0.005005s
Loss difference: 0.000000
Gradient difference: 0.000000

使用验证集来调优超参数(正则化强度和学习率)。您应该尝试不同的学习速率和正则化强度范围;如果您足够小心，您应该能够在验证集上获得超过0.35的分类精度。

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
# 使用验证集设置学习速率和正则化强度。这应该与支持向量机的验证相同;                  #
# 保存最好的训练有素的softmax分类器在best_softmax。                               #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

for learning_rate in learning_rates:
    for regularization_strength in regularization_strengths:
        softmax = Softmax()
        loss_history = softmax.train(X_train, y_train,
                                    learning_rate=learning_rate,
                                    reg=regularization_strength,
                                    num_iters=1500,verbose=True)
        y_train_pred = softmax.predict(X_train)
        train_acc = np.mean(y_train == y_train_pred)
        
        y_val_pred = softmax.predict(X_val)
        val_acc = np.mean(y_val == y_val_pred)
        
        if val_acc > best_val:
            best_val = val_acc
            best_softmax = softmax
            
            
        results[(learning_rate, regularization_strength)] = [train_acc, val_acc]

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

输出：

lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.350510 val accuracy: 0.362000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.329510 val accuracy: 0.337000
lr 5.000000e-07 reg 2.500000e+04 train accuracy: 0.343816 val accuracy: 0.365000
lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.317388 val accuracy: 0.324000
best validation accuracy achieved during cross-validation: 0.365000

在测试集上评估最好的softmax：

# evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('softmax on raw pixels final test set accuracy: %f' % (test_accuracy, ))

输出：

softmax on raw pixels final test set accuracy: 0.351000

lnline Question 2-Ture or False

假设总体训练损失定义为所有训练示例中每个数据点损失的和。可以在训练集中添加一个新的数据点，使SVM的损失保持不变，但Softmax分类器的损失不是这样。
Your Answer： 略… Orz

将每节课所学的权重形象化：

# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
    
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

输出：
在这里插入图片描述

一位以泪洗面的同学

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Cs231n作业：Q1-3 Softmax

Cs231n作业：Q1-3 Softmax exerciseSoftmax exercise梯度计算——摘自https://blog.csdn.net/yc461515457/article/details/51924604Softmax Classifierlnline Question 1lnline Question 2-Ture or FalseSoftmax exercise梯度计算...
复制链接

扫一扫