系列文章目录
机器学习笔记——梯度下降、反向传播
机器学习笔记——用pytorch实现线性回归
机器学习笔记——pytorch实现逻辑斯蒂回归Logistic regression
机器学习笔记——多层线性(回归)模型 Multilevel (Linear Regression) Model
深度学习笔记——pytorch构造数据集 Dataset and Dataloader
深度学习笔记——pytorch解决多分类问题 Multi-Class Classification
深度学习笔记——pytorch实现卷积神经网络CNN
深度学习笔记——卷积神经网络CNN进阶
深度学习笔记——循环神经网络 RNN
深度学习笔记——pytorch实现GRU
前言
参考视频——B站刘二大人《pytorch深度学习实践》
一、
1.1✖1卷积核
1x1卷积核,又称为网中网(Network in Network)
1x1卷积一般只改变输出通道数(channels),而不改变输出的宽度和高度。
通过1✖1卷积核做运算,可以改变数据的维度,并且可以有效的减少运算次数。
1✖1的卷积核在神经网络中非常有用,我们把应用了1✖1卷积核的神经网络称为NIN(network in network)
2.Inception Module
在GooleNet中有很多层的卷积层组成相当复杂的卷积神经网络,构建如此多层的卷积神经网络是十分复杂的,为了提高代码的复用性,于是提出了Inception Module的概念。
这是一个Inception Module
实现这个Inception Module
将每个分支用不同的卷积层,每个分支形成一个模块,将所有模块拼接在一起形成Concatenate。
二、Residual Network
研究人员发现,在CIFAR-10数据集中,一直叠加3✖3的卷积神经网络,56层3✖3的卷积神经网络效果不如20层的卷积神经网络。
引出一个问题——梯度消失
即当所有梯度都小于1时,众多梯度相乘的值接近于0,权重更新就会陷入停滞。
为了解决梯度消失问题,于是提出了Residual Network。
Residual block在计算时会引入运算前的x,这使得在计算梯度时,即H(x)求导时,H’(x)=F’(x)+1,这使得梯度的值在1附近。
普通网络和Residual 网络对比。
Residual block代码示意,Residual block不会改变矩阵的channel、width、height
Residual network代码示意
运行后达到更好的效果
各种不同的Residual network
各种不同的Residual network
三、代码
#!/user/bin/env python3
# -*- coding: utf-8 -*-
import torch
import torch.nn.functional as F
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
# Inception
class InceptionA(torch.nn.Module):
def __init__(self, in_channels):
super(InceptionA, self).__init__()
self.branch1X1 = torch.nn.Conv2d(in_channels=in_channels, out_channels=16, kernel_size=1) # 1*1的卷积核
self.branch5X5_1 = torch.nn.Conv2d(in_channels, out_channels=16, kernel_size=1)
self.branch5X5_2 = torch.nn.Conv2d(in_channels=16, out_channels=24, kernel_size=5, padding=2)
self.branch3X3_1 = torch.nn.Conv2d(in_channels, out_channels=16, kernel_size=1)
self.branch3X3_2 = torch.nn.Conv2d(in_channels=16, out_channels=24, kernel_size=3, padding=1)
self.branch3X3_3 = torch.nn.Conv2d(in_channels=24, out_channels=24, kernel_size=3, padding=1)
self.branch_pool = torch.nn.Conv2d(in_channels=in_channels, out_channels=24, kernel_size=1) # 1*1的卷积核
def forward(self, x):
branch1X1 = self.branch1X1(x)
branch5X5 = self.branch5X5_1(x)
branch5X5 = self.branch5X5_2(branch5X5)
branch3X3 = self.branch3X3_1(x)
branch3X3 = self.branch3X3_2(branch3X3)
branch3X3 = self.branch3X3_3(branch3X3)
# 池化分支
branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
branch_pool = self.branch_pool(branch_pool)
output = [branch1X1, branch5X5, branch3X3, branch_pool]
# 沿着output中所有矩阵的第一个维度(channel)拼接起来 即输出矩阵的通道数为88 (batch_size,channels=24+16+24+24=88,width,height)
return torch.cat(output, 1)
# 神经网络模型
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.mp = torch.nn.MaxPool2d(2) # 最大池化层
self.conv1 = torch.nn.Conv2d(1, 10, 5) # 卷积层
self.conv2 = torch.nn.Conv2d(88, 20, 5) # 卷积层
self.incep1 = InceptionA(10) # Inception Module
self.incep2 = InceptionA(20) # Inception Module
self.Linear1 = torch.nn.Linear(1408, 10) # 线性层
self.activate = torch.nn.ReLU() # 激活层
def forward(self, x):
size = x.size(0)
x = self.activate(self.mp(self.conv1(x)))
x = self.incep1(x)
x = self.activate(self.mp(self.conv2(x)))
x = self.incep2(x)
x = x.view(size, -1)
x = self.Linear1(x)
return x
# 实例化模型
model = Net()
# 损失函数
criterion = torch.nn.CrossEntropyLoss()
# 优化器
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
# 数据集
batch_size = 64
# 图像预处理
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# 数据集
train_data = datasets.MNIST(root='mnist/', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='mnist/', train=False, download=True, transform=transform)
# 数据装载
train_batch = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_batch = DataLoader(test_data, batch_size=batch_size, shuffle=False)
# 使用gpu
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device) # 模型复制到gpu上
# 可视化数据
epoch_list = []
accuracy_list = []
# 训练
def model_train(epoch):
running_loss = 0.0
for batch_idx, data in enumerate(train_batch, 1):
inputs, targets = data
inputs, targets = inputs.to(device), targets.to(device) # 将输入数据传入gpu
optimizer.zero_grad()
# forward
pred = model(inputs)
loss = criterion(pred, targets)
# backward
loss.backward()
# update
optimizer.step()
# 输出
running_loss += loss.item()
print('epoch:%d,loss=%.4f' % (epoch, running_loss / batch_idx))
running_loss = 0.0
# 测试
def model_test():
correct = 0
total = 0
with torch.no_grad():
for (inputs, targets) in test_batch:
inputs, targets = inputs.to(device), targets.to(device)
pred = model(inputs)
_, predicted = torch.max(pred.data, dim=1)
total += targets.size(0)
correct += (predicted == targets).sum().item()
print('Accuracy on test set:%d %%' % (100 * correct / total))
return correct / total
if __name__ == '__main__':
for epoch in range(10):
epoch_list.append(epoch)
model_train(epoch)
accuracy = model_test()
accuracy_list.append(accuracy)
# 绘图
plt.plot(epoch_list, accuracy_list)
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()
运行结果: