NN学习技巧之参数最优化的四种方法对比(SGD, Momentum, AdaGrad, Adam),基于MNIST数据集

前面几篇博文分析了每一种参数优化方案,现在做一个对比,代码参考斋藤的红鱼书第六章。

实验对mnist数据集的6万张图片训练,使用5层全连接神经网络(4个隐藏层,每个隐藏层有100个神经元),共迭代2000次,下图是损失函数随着训练迭代次数的变化:

可以看到SGD是最慢的,而AdaGrad最快, 且最终的识别精度也更高,这并不是一定的,跟数据也有关
在这里插入图片描述
贴出部分迭代过程变化:

===========iteration:1200===========
SGD:0.2986528195291609
Momentum:0.1037981040196782
AdaGrad:0.0668137679448615
Adam:0.05010293181776089
===========iteration:1300===========
SGD:0.17833478097202
Momentum:0.06128433751079029
AdaGrad:0.01779291355463178
Adam:0.036788168826807605
===========iteration:1400===========
SGD:0.30288604165486865
Momentum:0.07708723420976107
AdaGrad:0.036239187352732696
Adam:0.03584596636673899
===========iteration:1500===========
SGD:0.21648932214740826
Momentum:0.11593046640138721
AdaGrad:0.033343153287890816
Adam:0.039999528396092415
===========iteration:1600===========
SGD:0.23519516569365168
Momentum:0.06509188355944322
AdaGrad:0.0377409654184555
Adam:0.05803067028715449
===========iteration:1700===========
SGD:0.28851197390150085
Momentum:0.14561108131745754
AdaGrad:0.07160438141432544
Adam:0.07280250583341145
===========iteration:1800===========
SGD:0.14382629146685216
Momentum:0.03977221072571262
AdaGrad:0.015159891599626725
Adam:0.019623602905335474
===========iteration:1900===========
SGD:0.19067465612724083
Momentum:0.053986168113818435
AdaGrad:0.03665586658910679
Adam:0.038508895473566646

主要代码:(完整代码可去图灵社区找红鱼书,随书下载)

# coding: utf-8
# OptimizerCompare.py

import numpy as np
import matplotlib.pyplot as plt
from dataset.mnist import load_mnist
from MultiLayerNet import MultiLayerNet
from util import smooth_curve
from optimizer import *


# 0:读入MNIST数据==========
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)

train_size = x_train.shape[0]
batch_size = 128
max_iterations = 2000

# 1:进行实验的设置==========
optimizers = {}
optimizers['SGD'] = SGD()
optimizers['Momentum'] = Momentum()
optimizers['AdaGrad'] = AdaGrad()
optimizers['Adam'] = Adam()
# optimizers['RMSprop'] = RMSprop()

networks = {}
train_loss = {}
for key in optimizers.keys():
    networks[key] = MultiLayerNet(
        input_size=784, hidden_size_list=[100, 100, 100, 100],
        output_size=10)
    train_loss[key] = []

# 2:开始训练==========
for i in range(max_iterations):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]

    for key in optimizers.keys():
        grads = networks[key].gradient(x_batch, t_batch)
        optimizers[key].update(networks[key].params, grads)

        loss = networks[key].loss(x_batch, t_batch)
        train_loss[key].append(loss)

    if i % 100 == 0:
        print("===========" + "iteration:" + str(i) + "===========")
        for key in optimizers.keys():
            loss = networks[key].loss(x_batch, t_batch)
            print(key + ":" + str(loss))

# 3.绘制图形==========
markers = {"SGD": "o", "Momentum": "x", "AdaGrad": "s", "Adam": "D"}
x = np.arange(max_iterations)
for key in optimizers.keys():
    plt.plot(x, smooth_curve(train_loss[key]), marker=markers[key], \
    markevery=100, label=key)
plt.xlabel("iterations")
plt.ylabel("loss")
plt.ylim(0, 1)
plt.legend()
plt.show()
  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
好的,我来给你讲解一下使用PyTorch实现LeNet网络的具体步骤: 1.导入所需要的库 ```python import torch import torch.nn as nn import torch.optim as optim import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import numpy as np ``` 2.定义LeNet网络 ```python class LeNet(nn.Module): def __init__(self): super(LeNet, self).__init__() self.conv1 = nn.Conv2d(1, 6, 5) self.pool1 = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.pool2 = nn.MaxPool2d(2, 2) self.fc1 = nn.Linear(16 * 4 * 4, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool1(torch.relu(self.conv1(x))) x = self.pool2(torch.relu(self.conv2(x))) x = x.view(-1, 16 * 4 * 4) x = torch.relu(self.fc1(x)) x = torch.relu(self.fc2(x)) x = self.fc3(x) return x ``` 3.加载数据集 ```python batch_size = 64 transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2) testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2) ``` 4.定义优化器和损失函数 ```python net = LeNet() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) ``` 5.训练模型并绘制损失函数曲线和分类正确率曲线 ```python epochs = 10 train_losses = [] train_accs = [] test_losses = [] test_accs = [] for epoch in range(epochs): running_loss = 0.0 running_acc = 0.0 for i, data in enumerate(trainloader, 0): inputs, labels = data optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() _, predicted = torch.max(outputs.data, 1) running_acc += (predicted == labels).sum().item() train_loss = running_loss / len(trainloader.dataset) train_acc = running_acc / len(trainloader.dataset) train_losses.append(train_loss) train_accs.append(train_acc) test_loss = 0.0 test_acc = 0.0 with torch.no_grad(): for data in testloader: inputs, labels = data outputs = net(inputs) loss = criterion(outputs, labels) test_loss += loss.item() _, predicted = torch.max(outputs.data, 1) test_acc += (predicted == labels).sum().item() test_loss /= len(testloader.dataset) test_acc /= len(testloader.dataset) test_losses.append(test_loss) test_accs.append(test_acc) print('Epoch [%d/%d], Train Loss: %.4f, Train Accuracy: %.4f, Test Loss: %.4f, Test Accuracy: %.4f' % (epoch + 1, epochs, train_loss, train_acc, test_loss, test_acc)) plt.plot(train_losses, label='Training Loss') plt.plot(test_losses, label='Testing Loss') plt.legend() plt.show() plt.plot(train_accs, label='Training Accuracy') plt.plot(test_accs, label='Testing Accuracy') plt.legend() plt.show() ``` 6.保存最佳模型 ```python best_test_loss = min(test_losses) best_epoch = test_losses.index(best_test_loss) print('Best Epoch: %d, Best Test Loss: %.4f' % (best_epoch + 1, best_test_loss)) torch.save(net.state_dict(), 'best_model.pth') ``` 7.使用混淆矩阵展示模型性能 ```python confusion_matrix = torch.zeros(10, 10) with torch.no_grad(): for data in testloader: inputs, labels = data outputs = net(inputs) _, predicted = torch.max(outputs, 1) for i in range(len(labels)): confusion_matrix[labels[i]][predicted[i]] += 1 classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot') plt.figure(figsize=(10, 10)) plt.imshow(confusion_matrix, interpolation='nearest', cmap=plt.cm.Blues) plt.title('Confusion Matrix') plt.colorbar() tick_marks = np.arange(len(classes)) plt.xticks(tick_marks, classes, rotation=45) plt.yticks(tick_marks, classes) plt.xlabel('Predicted label') plt.ylabel('True label') plt.show() ``` 通过以上步骤,我们就可以完成使用PyTorch实现LeNet网络的训练和测试,并且得到最佳模型,最后使用混淆矩阵展示模型性能。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值