进入Jupyer Notebook (Linux Ubuntu)
micheal@Computer:~$ cd assignment1
micheal@Computer:~/assignment1$ source .env/bin/activate
(.env) micheal@Computer:~/assignment1$ jupyter notebook
# Softmax exercise
*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*
This exercise is analogous to the SVM exercise. You will:
#这个作业中需要完成的任务
- implement a fully-vectorized **loss function** for the Softmax classifier
- implement the fully-vectorized expression for its **analytic gradient**
- **check your implementation** with numerical gradient
- use a validation set to **tune the learning rate and regularization** strength
- **optimize** the loss function with **SGD**
- **visualize** the final learned weights
In[1] 准备 Preparation
from __future__ import print_function
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
#from __future__ import print_function 需要在第一行
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
#运行一些程序设定,图片的尺寸,插值方式,背景颜色
In[2] 读取数据和预处理 Reading Data & Preprocessing
#1 读取数据和标签,然后输出数据的尺寸。
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the linear classifier. These are the same steps as we used for the
SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'CIFAR10' # 如果默认路径读取不出文件,那么可以下载到本地。
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
打印查看读取的原生数据的形状shape
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
Train data shape: (50000, 32, 32, 3)
Train labels shape: (50000,)
Test data shape: (10000, 32, 32, 3)
Test labels shape: (10000,)
#2 subsample the data 对原生数据采样
从50000张图片中,选取1000张作为测试数据 X_val, y_val
# subsample the data
# range(49000,49000 + 1000)
# list(range(49000,49000 + 1000)) = [49000, 49001, ... , 49999]
mask = list(range(num_training, num_training + num_validation))
# 选取 原生50000张训练图片的最后1000张 作为 valdation set 验证图片集
X_val = X_train[mask]
# 选取 原生50000张训练图片的最后1000张图片的标签 作为 valdation set 验证图片集的标签
y_val = y_train[mask]
更新训练数据 X_train, y_train
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]
从10000张图片中选取num_test = 1000张图片作为测试集
mask = list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]
从num_training = 49000中,随机选取num_dev = 500张图片作为dev集
np.random.choice(49000,500,replace = False) # 从0, 1, 2, ... , 48999中选取 500 个不同(False)的数
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]
#3 Reshape 数据
将每一张图片(32,32,3),拉成(1,3072)的格式,在这里可以试着打印查看Shape。
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
print('Train data shape: ', X_train.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)
Train data shape: (49000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)
#4 Normalize the Data 标准化数据
仅中心化,课堂上说了一般不除以标准差
均值是指:num_train = 49000 张图片的每个通道(RGB)的每个像素(Pixle),分别求均值。
# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis = 0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image
print('mean_image shape: ', mean_image.shape)
print('Validation data shape: ', X_val.shape)
print('Train data shape: ', X_train.shape)
print('Test data shape: ', X_test.shape)
print('dev data shape: ', X_dev.shape)
mean_image shape: (3072,)
Validation data shape: (1000, 3072)
Train data shape: (49000, 3072)
Test data shape: (1000, 3072)
dev data shape: (500, 3072)
加入非齐次偏差项Bias = (1,1, … , 1)维度
np.hstack和np.vstack是拼接数组的两种方法,细节可以查看[np.vstack()和np.hstack()的用法](https://blog.csdn.net/m0_37393514/article/details/79538748)
# add bias dimension and transform into columns
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev
#5 打印查看结果
# Invoke the above function to get our data. 使用上面的函数来得到数据
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)
print('dev labels shape: ', y_dev.shape)
```output```
Train data shape: (49000, 3073)
Train labels shape: (49000,)
Validation data shape: (1000, 3073)
Validation labels shape: (1000,)
Test data shape: (1000, 3073)
Test labels shape: (1000,)
dev data shape: (500, 3073)
dev labels shape: (500,)
Softmax Classifier
Your code for this section will all be written inside cs231n/classifiers/softmax.py.
In[3] 使用嵌套循环计算 损失函数 Loss Function
- 首先使用嵌套循环 (比较初级的方法)
- 来计算损失函数 Loss Function
- 打开文件 s231n/classifiers/softmax.py 编辑 softmax_loss_naive function.
# First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function.
from cs231n.classifiers.softmax import softmax_loss_naive
import time
# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)
# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))
- Softmax初级损失函数 Softmax_Loss_Naive Function
#############################################################################
# TODO: Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
#pass
data_loss = 0.0
(N,D) = X.shape
C = W.shape[1]
data_loss_array = np.zeros((1,N))
scores = X.dot(W)
dW = np.zeros((D,C))
for i in range(N):
scores_i = scores[i,:]
scores_i -= np.max(scores[i,:])
sum_ij = np.sum(np.exp(scores_i))
probs = lambda t : np.exp(scores_i[t]) / sum_ij
data_loss += - np.log( probs(y[i]) )
for k in range(C):
probs_k = probs(k)
dW[:, k] += (probs_k - (k == y[i])) * X[i]
data_loss /= N
loss = data_loss + reg * np.sum(W*W)
dW /= N
dW += reg*W
这里需要注意的事情是X[i]
在 j
类的得分就是 X.dot(W)[i,j]
.
需要注意的语法:
probs = lambda t : np.exp(scores_i[t]) / sum_ij
(probs_k - (k == y[i]))
关于函数的推导和梯度的推导过程如下:
- 函数
这里通过查看网上已有的答案,都考虑的数值溢出的影响。
cs231n作业:assignment1 - softmax
笔记:CS231n+assignment1(作业一)
cs231n作业:assignment1 - softmax 直上云霄
l
o
s
s
=
d
a
t
a
_
l
o
s
s
+
R
e
g
=
1
N
×
∑
i
=
0
N
−
1
L
i
+
λ
Reg(W)
\mathtt{loss} = \mathtt{data\_loss + Reg} = \frac{1}{N}\times\sum^{N - 1}_{i = 0}L_i + \lambda \texttt{Reg(W)}
loss=data_loss+Reg=N1×i=0∑N−1Li+λReg(W)
其中
L_i
=
−
ln
⟮
e
s
y
i
−
m
a
x
(
s
y
i
)
Σ
j
e
e
j
i
−
m
a
x
(
e
j
i
)
⟯
\texttt{L\_i}= - \texttt{ln} \lgroup \frac{e^{s_{y_i}-max(s_{y_i})}}{\Sigma_j e^{{e_j}_i-max({e_j}_i)}} \rgroup
L_i=−ln⟮Σjeeji−max(eji)esyi−max(syi)⟯
- 梯度
∂ loss ∂ W[:,j] = ∂ loss ∂ data_loss × ∂ data_loss ∂ W[:,j] + ∂ loss ∂ Reg(W) × ∂ Reg(W) ∂ W[:,j] \frac{\partial \texttt{loss}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} \times \frac{\partial \texttt{data\_loss}}{\partial \texttt{W[:,j]}} + \frac{\partial \texttt{loss}}{\partial \texttt{Reg(W)}} \times \frac{\partial \texttt{Reg(W)}}{\partial \texttt{W[:,j]}} ∂W[:,j]∂loss=∂data_loss∂loss×∂W[:,j]∂data_loss+∂Reg(W)∂loss×∂W[:,j]∂Reg(W)
= ∂ loss ∂ data_loss × ∑ i = 0 N − 1 ⟮ ∂ data_loss ∂ probs(y[i]) × ∂ probs(y[i]) ∂ W[:,j] ⟯ + ∂ loss ∂ Reg(W) × ∂ Reg(W) ∂ W[:,j] = \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} \times \sum_{i = 0}^{N-1}\lgroup {\frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} \times \frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} \rgroup}+ \frac{\partial \texttt{loss}}{\partial \texttt{Reg(W)}} \times \frac{\partial \texttt{Reg(W)}}{\partial \texttt{W[:,j]}} =∂data_loss∂loss×i=0∑N−1⟮∂probs(y[i])∂data_loss×∂W[:,j]∂probs(y[i])⟯+∂Reg(W)∂loss×∂W[:,j]∂Reg(W)
∂ loss ∂ data_loss = 1 N \frac{\partial \texttt{loss}}{\partial \texttt{data\_loss}} = \frac{1}{N} ∂data_loss∂loss=N1
∂ data_loss ∂ probs(y[i]) = − 1 probs(y[i]) \frac{\partial \texttt{data\_loss}}{\partial \texttt{probs(y[i])}} = - \frac{1}{\texttt{probs(y[i])}} ∂probs(y[i])∂data_loss=−probs(y[i])1
∂
probs(y[i])
∂
W[:,j]
=
1
∑
j
e
s
c
o
r
e
s
_
i
[
y
[
i
]
]
e
s
c
o
r
e
s
_
i
[
y
[
i
]
]
∂
s
c
o
r
e
s
_
i
[
y
[
i
]
]
∂
W[:,j]
+
e
s
c
o
r
e
s
_
i
[
j
]
−
(
Σ
j
e
s
c
o
r
e
s
_
i
[
j
]
)
2
×
∑
j
s
c
o
r
e
s
_
i
[
j
]
⟮
e
s
c
o
r
e
s
_
i
[
j
]
∂
s
c
o
r
e
s
_
i
[
j
]
∂
W[:,j]
⟯
\frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \frac{1}{\sum_{j}e^{scores\_i [y[i]]}}e^{scores\_i [y[i]]} \frac{\partial scores\_i [y[i]]}{\partial \texttt{W[:,j]}} \\+ \frac{e^{scores\_i [j]}}{-(\Sigma_{j}e^{scores\_i [j]})^2} \times \sum_j^{scores\_i [j]} \lgroup{e^{scores\_i [j]}\frac{\partial scores\_i [j]}{\partial \texttt{W[:,j]}} \rgroup}
∂W[:,j]∂probs(y[i])=∑jescores_i[y[i]]1escores_i[y[i]]∂W[:,j]∂scores_i[y[i]]+−(Σjescores_i[j])2escores_i[j]×j∑scores_i[j]⟮escores_i[j]∂W[:,j]∂scores_i[j]⟯
∂
scores_i
[j]
∂
W[:,j]
=
∂
X[i]W[:,y[i]]
∂
W[:,j]
=
X[i]
,
(
y[i]
==
j
)
∂
scores_i
[j]
∂
W[:,j]
=
∂
X[i]W[:,y[i]]
∂
W[:,j]
=
0
,
(
y[i]
!=
j
)
\frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = \texttt{X[i]}\:\:,\:\: (\texttt{y[i] == j}) \\ \frac{\partial \texttt{scores\_i [j]}}{\partial \texttt{W[:,j]}} = \frac{\partial \texttt{X[i]W[:,y[i]]}}{\partial \texttt{W[:,j]}} = 0\:\:,\:\: (\texttt{y[i] != j})
∂W[:,j]∂scores_i [j]=∂W[:,j]∂X[i]W[:,y[i]]=X[i],(y[i] == j)∂W[:,j]∂scores_i [j]=∂W[:,j]∂X[i]W[:,y[i]]=0,(y[i] != j)
∂
probs(y[i])
∂
W[:,j]
=
(probs[j[i]]
-
(y[i]
==
j))X[i]
\frac{\partial \texttt{probs(y[i])}}{\partial \texttt{W[:,j]}} = \texttt{(probs[j[i]] - (y[i] == j))X[i]}
∂W[:,j]∂probs(y[i])=(probs[j[i]] - (y[i] == j))X[i]