Cs231n作业:Q1-3 Softmax

Softmax exercise

梯度计算——摘自https://blog.csdn.net/yc461515457/article/details/51924604

首先是给出Loss公式:
L = 1 N ∑ i L i + λ R ( W ) − − − ( 1 ) L = { \frac{1}{N} \sum_i L_i }+ { \lambda R(W) } ---(1) L=N1iLi+λR(W)1
共有N个样本,每个样本带来的Loss是L_i:
L i = − log ⁡ p y i = − log ⁡ ( e f y i ∑ j e f j ) = − f y i + log ⁡ ∑ j e f j − − − ( 2 ) L_i =-\log{p_{y_i}}= -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) = -f_{y_i} + \log\sum_j e^{f_j} ---(2) Li=logpyi=log(jefjefyi)=fyi+logjefj2
对 于 每 一 个 样 本 X i , 由 于 s o f t m a x 的 分 母 对 所 有 的 f j 进 行 了 累 积 求 和 , 对于每一个样本X_i,由于softmax的分母对所有的f_j进行了累积求和, Xisoftmaxfj 所 以 L i 对 W 的 导 数 对 W 的 每 一 列 都 有 贡 献 , 即 ∂ L i W j 对 所 有 的 j 都 不 为 0 : 所以L_i对W的导数对W的每一列都有贡献,即\frac{\partial{L_i}}{W_j}对所有的j都不为0: LiWWWjLij0
当 j ! = y i 时 : ∂ L i ∂ W j = e f j ∑ j e f j ∂ f j ∂ W j = e f j ∑ j e f j X i T − − − ( 3 ) 当j != y_i时:\frac {\partial{L_i}} {\partial{W_j}} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } \frac{\partial f_j}{\partial W_j} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } X_i^T ---(3) j!=yiWjLi=jefjefjWjfj=jefjefjXiT3
当 j = = y i 时 : ∂ L i ∂ W j = e f j ∑ j e f j ∂ f j ∂ W j = e f j ∑ j e f j X i T − X i T − − − ( 4 ) 当j == y_i时:\frac {\partial{L_i}} {\partial{W_j}} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } \frac{\partial f_j}{\partial W_j} = \frac{e^{f_{j}}}{ \sum_j e^{f_j} } X_i^T - X_i^T---(4) j==yiWjLi=jefjefjWjfj=jefjefjXiTXiT4
这个练习类似于SVM练习。你会:
•为Softmax分类器实现一个全矢量化的损失函数
•实现其解析梯度的全矢量表达式
•检查您的实现与数值梯度
•使用验证集来调整学习速度和正则化强度
•使用SGD优化损失函数
•想象最终学习到的重量

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

获取CIFAR10数据:

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the linear classifier. These are the same steps as we used for the
    SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    
    # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
    try:
       del X_train, y_train
       del X_test, y_test
       print('Clear previously loaded data.')
    except:
       pass

    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # subsample the data(子样品的数据)
    mask = list(range(num_training, num_training + num_validation))  # 49000-50000的1000个
    X_val = X_train[mask]  # 训练集后1000个,用于验证
    y_val = y_train[mask]
    mask = list(range(num_training))  # 49000
    X_train = X_train[mask]  # 训练集前49000个,用于训练
    y_train = y_train[mask]
    mask = list(range(num_test))  # 1000
    X_test = X_test[mask]  # 测试集前1000个,用于测试
    y_test = y_test[mask]
    mask = np.random.choice(num_training, num_dev, replace=False)  # 从训练集0-49000中随机选500个数据作为开发集,并且不能重用元素
    X_dev = X_train[mask]
    y_dev = y_train[mask]
    
    # Preprocessing: reshape the image data into rows(预处理,将数据图像重塑为行)
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
    
    # Normalize the data: subtract the mean image(标准化数据:减去图像的平均值)
    mean_image = np.mean(X_train, axis = 0)  # 训练集每个特征求平均值
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image
    
    # add bias dimension and transform into columns(添加偏差维度并转换为列)
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)

输出:

Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3073)
Test labels shape:  (1000,)
dev data shape:  (500, 3073)
dev labels shape:  (500,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.

打开文件cs231n/classifier /softmax.py并实现softmax_loss_naive函数。

from builtins import range
import numpy as np
from random import shuffle
from past.builtins import xrange

def softmax_loss_naive(W, X, y, reg):
    """
    Softmax损失函数,简单的实现(带有循环)输入有维数D,有C类,我们操作的是小批量N的例子。
    输入:
        - W:一个包含权重的形状(D, C)的数字数组。
        - X:形状(N, D)的数字数组,包含少量数据。
        - y:包含训练标签的形状(N,)的numpy数组;y[i] = c表示
        X[i]有标签c,其中0 <= c < c。
        - reg:(float)正则化强度
    返回一个元组:
        -单浮子损失
        -权值W的梯度;与W形状相同的数组
    """
    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)  # 3073x10

    #############################################################################
    # TODO: 使用显式循环计算软最大值损失及其梯度。                                  #
    # 将损耗存储在损耗中,梯度存储在dW中。如果在这里不小心,很容易遇到数值不稳定。     #
    # 不要忘记正则化!                                                            #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    num_train = X.shape[0]  # 获取样本数量500
    num_class= W.shape[1]  # 获取特征3073
    for i in range(num_train):
    	scores_i = X[i].dot(W)  # 1x3073  * 3073x10 = 1x10
    	exp_scores = np.exp(scores_i)  # e^scores的数组 1x10
    	sum_scores = np.sum(exp_scores)  # 求所有分数的总和 1x1
    	exp_scores /= sum_scores
    	correct_score = exp_scores[y[i]]  # 获取真正标签所对应的分数 1x1
    	loss += -np.log(correct_score)  # 计算该样本的损失
    	for j in range(num_class):
    		if j != y[i]:
    			dW[:,j] += exp_scores[j]*X[i]
    		else:
    			dW[:,j] += (exp_scores[y[i]]-1)*X[i]
    			
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    loss /= num_train  # 获取该样本的平均损失
    dW /= num_train
    loss += reg * np.sum(W * W)  # 加入正则化,得到完整的损失函数
    dW += reg * W
    return loss, dW
# First implement the naive softmax loss function with nested loops.首先用嵌套循环实现朴素的softmax损失函数。
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.
# 打开文件cs231n/classifier /softmax.py并实现
# softmax_loss_naive函数。
from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
# 生成一个随机的软最大权矩阵,并使用它来计算损失。
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
# 作为一个粗略的完整性检查,我们的损失应该接近-log(0.1)。
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))

输出:

loss: 2.354422
sanity check: 2.302585

lnline Question 1

为什么我们期望损失接近-log(0.1)呢?简要解释。
Your Answer:
由 公 式 : L i = − log ⁡ ( e s y i ∑ j e s j ) 知 : 由公式:L_i = -\log\left(\frac{e^{s_{y_i}}}{ \sum_j e^{s_j} }\right) 知: Li=log(jesjesyi)因为是随机初始化并且类的总和是10,所以正确预测类数的可能是1/10,那么损失将为-log(0.1)。

# Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
# 完成softmax_loss_naive的实现,并实现一个使用嵌套循环的渐变(naive)版本。
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
# 与我们对SVM所做的一样,使用数值梯度检查作为调试工具。数值梯度应接近解析梯度。
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

# similar to SVM case, do another gradient check with regularization
# 与SVM的情况类似,用正则化方法再做一次梯度检验
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

与我们对SVM所做的一样,使用数值梯度检查作为调试工具。数值梯度应接近解析梯度。
与SVM的情况类似,用正则化方法再做一次梯度检验:

numerical: 1.895536 analytic: 1.895536, relative error: 4.075329e-08
numerical: 1.145960 analytic: 1.145960, relative error: 7.754185e-09
numerical: 0.520816 analytic: 0.520816, relative error: 1.751680e-09
numerical: 0.019831 analytic: 0.019830, relative error: 1.837633e-06
numerical: -1.927560 analytic: -1.927560, relative error: 3.272616e-09
numerical: -2.331816 analytic: -2.331816, relative error: 1.606302e-08
numerical: 1.795485 analytic: 1.795485, relative error: 4.357714e-08
numerical: 2.946119 analytic: 2.946119, relative error: 2.266958e-08
numerical: 0.776307 analytic: 0.776307, relative error: 3.997289e-09
numerical: 2.914070 analytic: 2.914070, relative error: 6.318577e-09
numerical: -0.553924 analytic: -0.549033, relative error: 4.434225e-03
numerical: 0.000276 analytic: -0.001898, relative error: 1.000000e+00
numerical: 0.949624 analytic: 0.939115, relative error: 5.563964e-03
numerical: 1.651383 analytic: 1.655799, relative error: 1.335457e-03
numerical: -0.736941 analytic: -0.741462, relative error: 3.057579e-03
numerical: 2.944667 analytic: 2.947244, relative error: 4.372780e-04
numerical: 3.311702 analytic: 3.317417, relative error: 8.620659e-04
numerical: -1.625303 analytic: -1.628303, relative error: 9.221845e-04
numerical: 1.819605 analytic: 1.810454, relative error: 2.520883e-03
numerical: 0.428718 analytic: 0.432026, relative error: 3.843106e-03

既然我们已经有了softmax loss函数及其梯度的一个简单实现,那么就用softmax_loss_vectorized实现一个矢量化的版本。这两个版本应该计算相同的结果,但是矢量化版本应该更快。

def softmax_loss_vectorized(W, X, y, reg):
    """
    Softmax loss function, vectorized version.

    Inputs and outputs are the same as softmax_loss_naive.
    """
    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)
    num_train = X.shape[0]
    
    #############################################################################
    # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
    # Store the loss in loss and the gradient in dW. If you are not careful     #
    # here, it is easy to run into numeric instability. Don't forget the        #
    # regularization!                                                           #
    #############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    scores = X.dot(W)  # 计算分数矩阵。 500x3073 * 3073x10 = 500x10
    exp_scores = np.exp(scores)  # e^scores.  500x10 
    sum_scores = np.sum(exp_scores, axis = 1)  # 各个类所对应的分数总和。 (500,)
    exp_scores /= sum_scores[:,np.newaxis]  # 标准化后的概率 500x10
    
    loss_matrix = -np.log(exp_scores[range(num_train),y])  # 500,
    loss += np.sum(loss_matrix)
    exp_scores[range(num_train),y] -= 1  # 取正确标签处,减一。注:必须使用range(num_train)而非:
    dW += np.dot(X.T, exp_scores)  # 3073x500 * 500x10 = 3073x10
	
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    loss /= num_train
    loss += reg * np.sum(W*W)
    dW /= num_train
    dW +=reg *W
    
    return loss, dW
# Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
# 和SVM一样,我们使用Frobenius范数来比较两个版本的梯度。
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('Loss difference: %f' % np.abs(loss_naive - loss_vectorized))
print('Gradient difference: %f' % grad_difference)

输出:
和SVM一样,我们使用Frobenius范数来比较两个版本的梯度。

naive loss: 2.354422e+00 computed in 0.128090s
vectorized loss: 2.354422e+00 computed in 0.005005s
Loss difference: 0.000000
Gradient difference: 0.000000

使用验证集来调优超参数(正则化强度和学习率)。您应该尝试不同的学习速率和正则化强度范围;如果您足够小心,您应该能够在验证集上获得超过0.35的分类精度。

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
# 使用验证集设置学习速率和正则化强度。这应该与支持向量机的验证相同;                  #
# 保存最好的训练有素的softmax分类器在best_softmax。                               #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

for learning_rate in learning_rates:
    for regularization_strength in regularization_strengths:
        softmax = Softmax()
        loss_history = softmax.train(X_train, y_train,
                                    learning_rate=learning_rate,
                                    reg=regularization_strength,
                                    num_iters=1500,verbose=True)
        y_train_pred = softmax.predict(X_train)
        train_acc = np.mean(y_train == y_train_pred)
        
        y_val_pred = softmax.predict(X_val)
        val_acc = np.mean(y_val == y_val_pred)
        
        if val_acc > best_val:
            best_val = val_acc
            best_softmax = softmax
            
            
        results[(learning_rate, regularization_strength)] = [train_acc, val_acc]

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

输出:

lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.350510 val accuracy: 0.362000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.329510 val accuracy: 0.337000
lr 5.000000e-07 reg 2.500000e+04 train accuracy: 0.343816 val accuracy: 0.365000
lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.317388 val accuracy: 0.324000
best validation accuracy achieved during cross-validation: 0.365000

在测试集上评估最好的softmax:

# evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('softmax on raw pixels final test set accuracy: %f' % (test_accuracy, ))

输出:

softmax on raw pixels final test set accuracy: 0.351000

lnline Question 2-Ture or False

假设总体训练损失定义为所有训练示例中每个数据点损失的和。可以在训练集中添加一个新的数据点,使SVM的损失保持不变,但Softmax分类器的损失不是这样。
Your Answer: 略… Orz

将每节课所学的权重形象化:

# Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)

w_min, w_max = np.min(w), np.max(w)

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)
    
    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

输出:
在这里插入图片描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值