CS231n简介
详见 CS231n课程笔记1:Introduction。
本文都是作者自己的思考,正确性未经过验证,欢迎指教。
作业笔记
1. SVM的交叉验证&调参
交叉验证的具体内容请参考CS231n课程笔记5.4:超参数的选择&交叉验证。
交叉验证部分的代码和SVM作业中的一模一样,详情请参考CS231n作业笔记1.4:随机梯度下降(SGD),CS231n作业笔记1.3:SVM的误差函数以及反向传播(非向量及向量算法)。
1.1. 范围内随机生成组合,然后得到验证准确度
SVM中需要调节learning rate,以及 regularization strength这个两个超参数。庆幸的是课程中提供了相应的搜索范围,所以直接进行交叉验证,得到的结果
learning_rates = [5e-9, 1e-8]
regularization_strengths = [1e6, 5e6]
results = {}
best_val = -1 # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation rate.
rand_turple = np.random.rand(50,2)
rand_turple[:,0] = rand_turple[:,0]*(learning_rates[1]-learning_rates[0]) + learning_rates[0]
rand_turple[:,1] = rand_turple[:,1]*(regularization_strengths[1]-regularization_strengths[0])+regularization_strengths[0]
for lr,rs in rand_turple:
svm = LinearSVM()
svm.train(X_train_feats, y_train, learning_rate=lr, reg=rs,num_iters=1000, verbose=False)
y_train_pred = svm.predict(X_train_feats)
train_acc = np.mean(y_train == y_train_pred)
y_val_pred = svm.predict(X_val_feats)
val_acc = np.mean(y_val == y_val_pred)
results[(lr,rs)] = (train_acc,val_acc)
if (val_acc > best_val):
best_val = val_acc
best_svm = svm
1.2. 训练结果展示,查看趋势,更改搜索范围
import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]
colors = [results[x][1] for x in results]
plt.scatter(x_scatter,y_scatter,100,colors)
plt.colorbar()
plt.xlabel("log learning rate")
plt.ylabel("log regulization strength")
plt.title("CIFAR-10 validation accuracy")
plt.show()
2. 神经网络的交叉验证&调参
这部分也和神经网络作用于raw input的代码基本相同,详情请参考CS231n作业笔记1.6:神经网络的误差与梯度计算。
2.1. 确定超参数的大致范围
神经网络中需要调节的超参数有隐层神经元个数、learning rate以及regularization strength。课程给出了隐层神经元个数的大致范围:500左右。为了确定其他两个超参数的大致范围,则需要根据训练结果,画出Loss曲线以及train_acc和val_acc的曲线。如果Loss不收敛,则需要改变learning rate的值;理想的曲线是指数形式的。如果train_acc与val_acc的差值过大,则需要增强regularization strength,过小则需要减弱regularization strength。
对于其他超参数,选取通用常见参数即可。
注:此部分为粗调,所以每次更改超参数的步长应尽可能大。
net = TwoLayerNet(input_dim, hidden_dim, num_classes)
stats = net.train(X_train_feats,y_train,X_val_feats,y_val,num_iters=1500,batch_size=300,
learning_rate = 1e-1, learning_rate_decay = 0.9,reg = 7e-4,verbose = True)
plt.subplot(2,1,1)
plt.plot(stats["loss_history"])
plt.subplot(2,1,2)
plt.plot(stats["train_acc_history"])
plt.plot(stats["val_acc_history"])
plt.show()
2.2. 随机超参数组合,得到相应准确率
此部分为细调,每次在规定搜索范围内随机生成超参数组合,得到相应的准确率。
hidden_size = [455,455]
lr = [9e-2,9e-1]
rg = [1e-4,1e-3]
num_net = 50
params = np.random.rand(num_net,3)
params[:,0] = params[:,0]*(hidden_size[1]-hidden_size[0])+hidden_size[0]
params[:,1] = params[:,1]*(lr[1]-lr[0])+lr[0]
params[:,2] = params[:,2]*(rg[1]-rg[0])+rg[0]
results = {}
best_net = None
best_val_acc = 0
for hs,l,r in params:
net = TwoLayerNet(input_dim, int(hs), num_classes)
# Train the network
stats = net.train(X_train_feats, y_train, X_val_feats, y_val,
num_iters=1500, batch_size=200,
learning_rate=l, learning_rate_decay=0.95,
reg=r, verbose=False)
# Predict on the validation set
val_acc = (net.predict(X_val_feats) == y_val).mean()
results[(hs,l,r)] = val_acc
if (val_acc > best_val_acc):
best_val_acc = val_acc
best_net = net
for hs,l,r in results:
print hs,l,r,results[(hs,l,r)]
print "best validation accuracy:",best_val_acc
2.3. 作图以确定趋势
利用得到的准确率作图,从而确定趋势,进而改变搜索范围。
import math
x_scatter = [math.log10(x[1]) for x in results]
y_scatter = [math.log10(x[2]) for x in results]
colors = [results[x] for x in results]
plt.scatter(x_scatter,y_scatter,100,colors)
plt.colorbar()
#plt.xlabel("hidden layer size")
plt.xlabel("log learning rate")
plt.ylabel("log regulization strength")
plt.title("CIFAR-10 validation accuracy")
plt.show()