在做Pytorch 手写体识别实验时,遇到程序跑多次后,仅设置4层卷积,就出现了梯度消失的问题。
根据书中撰写,卷积层数应该在10层的时候,有较好的运行效果,但是运行参考书的代码历程同样出现了该问题。
书中的源代码如下:
import torch
import torchvision
# 设置超参数
batch_size = 100
input_size = 784
hidden_size = 1000
num_classes = 10
num_epochs = 5
learning_rate = 0.001
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 从TorchVision下载MNIST数据集
train_dataset = torchvision.datasets.MNIST(root='./data',
train=True,
transform=torchvision.transforms.ToTensor(),
download = True)
test_dataset = torchvision.datasets.MNIST(root='./data',
train=False,
transform=torchvision.transforms.ToTensor())
# 使用PyTorch提供的DataLoader,以分批乱序加载数据
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
conv_layer_number = 10
class NeuralNetwork(torch.nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.conv_start = torch.nn.Sequential(
torch.nn.Conv2d(1, 16, 3, 1, 1),
torch.nn.ReLU()
)
# 卷积核大小为3,步长1,填充1则不改变图片尺寸,可直接叠加
self.conv_loop = torch.nn.Sequential(
torch.nn.Conv2d(16, 16, 3, 1, 1),
torch.nn.ReLU()
)
self.conv_end = torch.nn.Sequential(
torch.nn.Conv2d(16, 1, 3, 1, 1),
torch.nn.ReLU()
)
self.fc = torch.nn.Linear(28 * 28, 10)
def forward(self, x):
x = self.conv_start(x)
for i in range(conv_layer_number):
x = self.conv_loop(x)
x = self.conv_end(x)
x = self.fc(x.reshape(-1, 28 * 28))
return x
model = NeuralNetwork().to(device)
# 设置损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 训练模型
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
loss = criterion(outputs, labels)
# 反向传播算出Loss对各参数的梯度
optimizer.zero_grad()
loss.backward()
# 更新参数
optimizer.step()
# 检验模型在测试集上的准确性
correct = 0
total = 0
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy on test_set: {} %'.format(100 * correct / total))
经实验,应是spyder的运行时,可以设置是否清楚上次运行的参数,我没有清空,导致参数快速的达到导数最低点,以致梯度消失。
每次运行spyder,清空参数的操作参考博文:
https://www.cnblogs.com/xiangsui/p/12971447.html