前面几篇博文分析了每一种参数优化方案,现在做一个对比,代码参考斋藤的红鱼书第六章。
实验对mnist数据集的6万张图片训练,使用5层全连接神经网络(4个隐藏层,每个隐藏层有100个神经元),共迭代2000次,下图是损失函数随着训练迭代次数的变化:
可以看到SGD是最慢的,而AdaGrad最快, 且最终的识别精度也更高,这并不是一定的,跟数据也有关
贴出部分迭代过程变化:
===========iteration:1200===========
SGD:0.2986528195291609
Momentum:0.1037981040196782
AdaGrad:0.0668137679448615
Adam:0.05010293181776089
===========iteration:1300===========
SGD:0.17833478097202
Momentum:0.06128433751079029
AdaGrad:0.01779291355463178
Adam:0.036788168826807605
===========iteration:1400===========
SGD:0.30288604165486865
Momentum:0.07708723420976107
AdaGrad:0.036239187352732696
Adam:0.03584596636673899
===========iteration:1500===========
SGD:0.21648932214740826
Momentum:0.11593046640138721
AdaGrad:0.033343153287890816
Adam:0.039999528396092415
===========iteration:1600===========
SGD:0.23519516569365168
Momentum:0.06509188355944322
AdaGrad:0.0377409654184555
Adam:0.05803067028715449
===========iteration:1700===========
SGD:0.28851197390150085
Momentum:0.14561108131745754
AdaGrad:0.07160438141432544
Adam:0.07280250583341145
===========iteration:1800===========
SGD:0.14382629146685216
Momentum:0.03977221072571262
AdaGrad:0.015159891599626725
Adam:0.019623602905335474
===========iteration:1900===========
SGD:0.19067465612724083
Momentum:0.053986168113818435
AdaGrad:0.03665586658910679
Adam:0.038508895473566646
主要代码:(完整代码可去图灵社区找红鱼书,随书下载)
# coding: utf-8
# OptimizerCompare.py
import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from MultiLayerNet import MultiLayerNet
from util import smooth_curve
from optimizer import *
# 0:读入MNIST数据==========
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)
train_size = x_train.shape[0]
batch_size = 128
max_iterations = 2000
# 1:进行实验的设置==========
optimizers = {}
optimizers['SGD'] = SGD()
optimizers['Momentum'] = Momentum()
optimizers['AdaGrad'] = AdaGrad()
optimizers['Adam'] = Adam()
# optimizers['RMSprop'] = RMSprop()
networks = {}
train_loss = {}
for key in optimizers.keys():
networks[key] = MultiLayerNet(
input_size=784, hidden_size_list=[100, 100, 100, 100],
output_size=10)
train_loss[key] = []
# 2:开始训练==========
for i in range(max_iterations):
batch_mask = np.random.choice(train_size, batch_size)
x_batch = x_train[batch_mask]
t_batch = t_train[batch_mask]
for key in optimizers.keys():
grads = networks[key].gradient(x_batch, t_batch)
optimizers[key].update(networks[key].params, grads)
loss = networks[key].loss(x_batch, t_batch)
train_loss[key].append(loss)
if i % 100 == 0:
print("===========" + "iteration:" + str(i) + "===========")
for key in optimizers.keys():
loss = networks[key].loss(x_batch, t_batch)
print(key + ":" + str(loss))
# 3.绘制图形==========
markers = {"SGD": "o", "Momentum": "x", "AdaGrad": "s", "Adam": "D"}
x = np.arange(max_iterations)
for key in optimizers.keys():
plt.plot(x, smooth_curve(train_loss[key]), marker=markers[key], \
markevery=100, label=key)
plt.xlabel("iterations")
plt.ylabel("loss")
plt.ylim(0, 1)
plt.legend()
plt.show()