2017CS231n Assignment1 Softmax

最新推荐文章于 2020-10-04 16:01:03 发布

Micheal Parley Lea

最新推荐文章于 2020-10-04 16:01:03 发布

阅读量190

点赞数

本文链接：https://blog.csdn.net/weixin_44620028/article/details/102952424

版权

本文档详细介绍了如何在Jupyter Notebook环境下，针对2017年CS231n课程的Assignment1，实现Softmax分类器。内容包括数据预处理（读取、采样、reshape、标准化）、损失函数的计算，特别是用嵌套循环计算损失函数的初等方法，并探讨了梯度计算和数值溢出的处理。

摘要由CSDN通过智能技术生成

进入Jupyer Notebook (Linux Ubuntu)

micheal@Computer:~$ cd assignment1
micheal@Computer:~/assignment1$ source .env/bin/activate
(.env) micheal@Computer:~/assignment1$ jupyter notebook

# Softmax exercise

*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*

This exercise is analogous to the SVM exercise. You will:
#这个作业中需要完成的任务
- implement a fully-vectorized **loss function** for the Softmax classifier
- implement the fully-vectorized expression for its **analytic gradient**
- **check your implementation** with numerical gradient
- use a validation set to **tune the learning rate and regularization** strength
- **optimize** the loss function with **SGD**
- **visualize** the final learned weights

In[1] 准备 Preparation

from __future__ import print_function
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

#from __future__ import print_function 需要在第一行

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
#运行一些程序设定，图片的尺寸，插值方式，背景颜色

In[2] 读取数据和预处理 Reading Data & Preprocessing

#1 读取数据和标签，然后输出数据的尺寸。

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the linear classifier. These are the same steps as we used for the
    SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'CIFAR10' # 如果默认路径读取不出文件，那么可以下载到本地。
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

打印查看读取的原生数据的形状shape

    print('Train data shape: ', X_train.shape)
    print('Train labels shape: ', y_train.shape)
    print('Test data shape: ', X_test.shape)
    print('Test labels shape: ', y_test.shape)

    Train data shape:  (50000, 32, 32, 3)
    Train labels shape:  (50000,)
    Test data shape:  (10000, 32, 32, 3)
    Test labels shape:  (10000,)

#2 subsample the data 对原生数据采样

从50000张图片中，选取1000张作为测试数据 X_val, y_val

    # subsample the data
    # range(49000,49000 + 1000)
    # list(range(49000,49000 + 1000)) = [49000, 49001, ... , 49999]
    mask = list(range(num_training, num_training + num_validation))
    # 选取 原生50000张训练图片的最后1000张 作为 valdation set 验证图片集
    X_val = X_train[mask]
    # 选取 原生50000张训练图片的最后1000张图片的标签 作为 valdation set 验证图片集的标签
    y_val = y_train[mask]

更新训练数据 X_train, y_train

    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]

从10000张图片中选取num_test = 1000张图片作为测试集

    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]

从num_training = 49000中，随机选取num_dev = 500张图片作为dev集

np.random.choice(49000,500,replace = False) # 从0, 1, 2, ... , 48999中选取 500 个不同(False)的数

    mask = np.random.choice(num_training, num_dev, replace=False)
    X_dev = X_train[mask]
    y_dev = y_train[mask]

#3 Reshape 数据

将每一张图片(32,32,3)，拉成(1,3072)的格式，在这里可以试着打印查看Shape。

    # Preprocessing: reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

    print('Train data shape: ', X_train.shape)
    print('Test data shape: ', X_test.shape)
    print('dev data shape: ', X_dev.shape)
    
    Train data shape:  (49000, 3072)
    Test data shape:  (1000, 3072)
    dev data shape:  (500, 3072)

#4 Normalize the Data 标准化数据

仅中心化，课堂上说了一般不除以标准差

均值是指：num_train = 49000 张图片的每个通道(RGB)的每个像素(Pixle)，分别求均值。

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis = 0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image

    print('mean_image shape: ', mean_image.shape)
    print('Validation data shape: ', X_val.shape)
    print('Train data shape: ', X_train.shape)
    print('Test data shape: ', X_test.shape)
    print('dev data shape: ', X_dev.shape)
    
    mean_image shape:  (3072,)
    Validation data shape:  (1000, 3072)
    Train data shape:  (49000, 3072)
    Test data shape:  (1000, 3072)
    dev data shape:  (500, 3072)

加入非齐次偏差项Bias = (1,1, … , 1)维度

np.hstack和np.vstack是拼接数组的两种方法，细节可以查看[np.vstack()和np.hstack()的用法](https://blog.csdn.net/m0_37393514/article/details/79538748)

    # add bias dimension and transform into columns
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev

#5 打印查看结果

# Invoke the above function to get our data.  使用上面的函数来得到数据
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)


```output```

Train data shape:  (49000, 3073)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3073)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3073)
Test labels shape:  (1000,)
dev data shape:  (500, 3073)
dev labels shape:  (500,)

Softmax Classifier

Your code for this section will all be written inside cs231n/classifiers/softmax.py.

In[3] 使用嵌套循环计算损失函数 Loss Function

首先使用嵌套循环 (比较初级的方法)
来计算损失函数 Loss Function
打开文件 s231n/classifiers/softmax.py 编辑 softmax_loss_naive function.

# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.

from cs231n.classifiers.softmax import softmax_loss_naive
import time

# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)

# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))

Softmax初级损失函数 Softmax_Loss_Naive Function

  #############################################################################
  # TODO: Compute the softmax loss and its gradient using explicit loops.     #
  # Store the loss in loss and the gradient in dW. If you are not careful     #
  # here, it is easy to run into numeric instability. Don't forget the        #
  # regularization!                                                           #
  #############################################################################
  #pass
  data_loss = 0.0
  (N,D) = X.shape
  C = W.shape[1]
  data_loss_array = np.zeros((1,N))
  scores = X.dot(W)
  dW = np.zeros((D,C))
  for i in range(N):
     scores_i = scores[i,:]
     scores_i -= np.max(scores[i,:])
     sum_ij = np.sum(np.exp(scores_i))
     probs = lambda t : np.exp(scores_i[t]) / sum_ij
     data_loss += - np.log( probs(y[i]) )
     for k in range(C):
      probs_k = probs(k)
      dW[:, k] += (probs_k - (k == y[i])) * X[i]     
  data_loss /= N 
  loss = data_loss + reg * np.sum(W*W)
  dW /= N
  dW += reg*W

这里需要注意的事情是X[i]在 j 类的得分就是 X.dot(W)[i,j].
需要注意的语法:

probs = lambda t : np.exp(scores_i[t]) / sum_ij
(probs_k - (k == y[i]))

关于函数的推导和梯度的推导过程如下:

函数
这里通过查看网上已有的答案，都考虑的数值溢出的影响。

cs231n作业：assignment1 - softmax
笔记：CS231n+assignment1（作业一）
cs231n作业：assignment1 - softmax 直上云霄

$\mathtt{loss} = \mathtt{data\_loss + Reg} = \frac{1}{N}\times\sum^{N - 1}_{i = 0}L_i + \lambda \texttt{Reg(W)}$
其中 $L_i = − ln ⟮ e s y i − m a x ( s y i ) Σ j e e j i − m a x ( e j i ) ⟯ \texttt{L\_i}= - \texttt{ln} \lgroup \frac{e^{s_{y_i}-max(s_{y_i})}}{\Sigma_j e^{{e_j}_i-max({e_j}_i)}} \rgroup$

梯度
$data_loss × ∂ data_loss ∂ W[:,j] + ∂ loss ∂ Reg(W) × ∂ Reg(W) ∂ W[:,j] \frac{\partial \texttt{loss}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} \times \frac{\partial \texttt{data\_loss}}{\partial \texttt{W[:,j]}} + \frac{\partial \texttt{loss}}{\partial \texttt{Reg(W)}} \times \frac{\partial \texttt{Reg(W)}}{\partial \texttt{W[:,j]}}$
$data_loss × ∑ i = 0 N − 1 ⟮ ∂ data_loss ∂ probs(y[i]) × ∂ probs(y[i]) ∂ W[:,j] ⟯ + ∂ loss ∂ Reg(W) × ∂ Reg(W) ∂ W[:,j] = \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} \times \sum_{i = 0}^{N-1}\lgroup {\frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} \times \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} \rgroup}+ \frac{\partial \texttt{loss}}{\partial \texttt{Reg(W)}} \times \frac{\partial \texttt{Reg(W)}}{\partial \texttt{W[:,j]}}$
$data_loss = 1 N \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} = \frac{1}{N}$

$data_loss ∂ probs(y[i]) = − 1 probs(y[i]) \frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} = - \frac{1}{\texttt{probs(y[i])}}$

$\frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \frac{1}{\sum_{j}e^{scores\_i [y[i]]}}e^{scores\_i [y[i]]} \frac{\partial scores\_i [y[i]]}{\partial \texttt{W[:,j]}} \\+ \frac{e^{scores\_i [j]}}{-(\Sigma_{j}e^{scores\_i [j]})^2} \times \sum_j^{scores\_i [j]} \lgroup{e^{scores\_i [j]}\frac{\partial scores\_i [j]}{\partial \texttt{W[:,j]}} \rgroup}$
$scores_i [j] ∂ W[:,j] = ∂ X[i]W[:,y[i]] ∂ W[:,j] = X[i] , ( y[i] == j ) ∂ scores_i [j] ∂ W[:,j] = ∂ X[i]W[:,y[i]] ∂ W[:,j] = 0 , ( y[i] != j ) \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = \texttt{X[i]}\:\:,\:\: (\texttt{y[i] == j}) \\ \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = 0\:\:,\:\: (\texttt{y[i] != j})$
$\frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \texttt{(probs[j[i]] - (y[i] == j))X[i]}$