cs231n assignment(1.3):softmax分类器
softmax分类器的练习宗旨在于:
- 实现全向量模式的Softmax分类器的损失函数
- 实现其损失函数的全向量模式的解析梯度
- 利用数值检测来检测结果
- 使用验证集来调整学习率和正则强度
- 利用SGD优化损失函数
- 可视化最终学习到的权重矩阵
1.数据准备与预处理
这里的数据准备与预处理阶段与上一篇svm分类器中的准备与预处理阶段相同,所以略去只写结果:
Train data shape: (49000, 3073)
Train labels shape: (49000,)
Validation data shape: (1000, 3073)
Validation labels shape: (1000,)
Test data shape: (1000, 3073)
Test labels shape: (1000,)
dev data shape: (500, 3073)
dev labels shape: (500,)
2.Softmax分类器
Softmax函数为
P(yi|xi,W)=efyi∑jefj
所以其损失函数为:
Li=−log(esyi∑jesj)
整体损失函数与上次相同:
L=1N∑i=1NLi+R(W)
同上次一样,作业也要求实现有循环和无循环纯向量的两种softmax分类器。
这里记几个点:
1.记得要给得到的分数(
f=XW
)减去每行中最大的值,至于为什么可以去看比价
2.向量写法本质上就是帮助缩减了相乘相加的过程,所以书写完naive的代码,哪里有循环然后相乘相加的地方,就可以用向量代码实现
分别贴我自己的代码:
naive版本:
def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops)
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
num_classes = W.shape[1]
f = X.dot(W)
f_max = np.max(f,axis=1,keepdims = True)
f_scores = f - f_max
prob = np.exp(f_scores)/np.sum(np.exp(f_scores),axis = 1, keepdims = True)
#loss
for i in xrange(num_train):
loss += -np.log(prob[i,y[i]])
loss /= num_train
loss += 0.5 * reg * np.sum(W*W)
#dW
for i in xrange(num_train):
for j in xrange(num_classes):
if j == y[i]:
dW[:,j] += (prob[i,j] - 1) * X[i,:]
else:
dW[:,j] += prob[i,j] * X[i,:]
dW /= num_train
dW += reg * W
return loss, dW
vectorized版本:
def softmax_loss_vectorized(W, X, y, reg):
loss = 0.0
dW = np.zeros_like(W)
num_train = X.shape[0]
f = X.dot(W)
f_max = np.max(f,axis = 1, keepdims = True)
f_scores = f - f_max
prob = np.exp(f_scores)/np.sum(np.exp(f_scores),axis = 1, keepdims = True)
#loss
loss = np.sum(-np.log(prob[np.arange(num_train),y[np.arange(num_train)]]))
loss /= num_train
loss += 0.5 * reg * np.sum(W*W)
#dW
prob[np.arange(num_train),y[np.arange(num_train)]] -= 1
dW = X.T.dot(prob)
dW = dW / num_train + reg * W
return loss, dW
3.调整超参数、可视化最终的结果
笔者在验证集上做到了35.2%,测试集32.9%。
然后是权重矩阵的图像
和SVM还是很类似的