- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊|接辅导、项目定制
1 前言
ResNet(Residual Neural Network)是一种深度卷积神经网络结构,由微软亚洲研究院的Kaiming He等人在2015年提出。ResNet在图像分类和计算机视觉任务中取得了巨大的成功,并成为了当今最常用的深度学习模型之一。
ResNet主要解决深度卷积网络在加深时候的“退化问题”。
传统的深度卷积神经网络存在一个问题,即随着网络层数的增加,网络性能会出现退化现象。退化现象的表现是在训练过程中,随着网络层数增加,网络的准确率反而下降。这是因为随着网络深度的增加,梯度在反向传播过程中容易消失,导致难以训练深层网络。
为了解决这个问题,ResNet提出了一种残差学习的思想。残差学习的关键在于引入了"残差块"(residual block),它允许网络学习恒等映射(identity mapping)的残差。具体而言,残差块通过在网络中引入跳跃连接(shortcut connection)和恒等映射(identity mapping)来构建。
在传统的卷积神经网络中,网络层之间是顺序连接的,输出特征会直接传递给下一层。而在ResNet中,每个残差块的输入会先经过一系列卷积层和激活函数,然后与输入特征相加,再经过激活函数输出。这种设计使得网络可以学习残差,更容易训练深层网络。
ResNet的基本单元是残差块(Residual Block)。一个残差块由两个卷积层组成,每个卷积层后面跟着一个批量归一化(Batch Normalization)层和一个激活函数(通常使用ReLU)。此外,如果输入和输出的特征图大小不一致,需要引入一个额外的卷积层进行维度匹配。
除了基本的残差块,ResNet还有不同深度的变体,包括ResNet-18、ResNet-34、ResNet-50、ResNet-101和ResNet-152等。这些变体的深度不同,由于残差学习的特性,它们能够在更深的网络中取得更好的性能。
ResNet的成功证明了残差学习的有效性,并启发了后续深度学习模型的设计。它被广泛应用于图像分类、目标检测、语义分割等计算机视觉任务,并在各种竞赛和实际应用中取得了出色的结果。
ResNet50算法已被收录在torchvision.models中,代码如下:
from torchvision import models
from torchsummary import summary
model = models.resnet50(pretrained=False)
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
model.to(device)
summary(model, (3, 224, 224))
[图片]
2 前期准备
前期工作中包含数据处理、划分数据集等相关操作,由于在前面的文章中都有较为详细的解释,故在此只贴出代码。
import torch
from torchvision import datasets, transforms
import torch.nn as nn
import time
import numpy as np
import matplotlib.pyplot as plt
import torchsummary as summary
from collections import OrderedDict
data_dir = ‘/Users/montylee/NJUPT/Learn/Github/deeplearning/CNN/J1/data/bird_photos’
def random_split_imagefolder(data_dir, transforms, random_split_rate=0.8):
‘’’
随机分割数据集
:param data_dir: 数据集路径
:param transforms: 数据预处理
:param random_split_rate: 训练集比例
:return: total_data, train_datasets, test_datasets
‘’’
_total_data = datasets.ImageFolder(data_dir, transform=transforms)
train_size = int(random_split_rate * len(_total_data))
test_size = len(_total_data) - train_size
_train_datasets, _test_datasets = torch.utils.data.random_split(_total_data, [train_size, test_size])
return _total_data, _train_datasets, _test_datasets
获取真实均值-标准差
N_CHANNELS = 3 # RGB
mean = torch.zeros(N_CHANNELS)
std = torch.zeros(N_CHANNELS)
for inputs, labels in (total_data):
for i in range(N_CHANNELS):
mean[i] += inputs[i,:,:].mean()
std[i] += inputs[i,:,:].std()
mean.div(len(total_data))
std.div_(len(total_data))
print(mean, std)
真实均值-标准差重新读取数据
real_transforms = transforms.Compose(
[
transforms.Resize(224),#中心裁剪到224*224
transforms.ToTensor(),#转化成张量
transforms.Normalize(mean, std)
])
total_data, train_datasets, test_datasets = random_split_imagefolder(data_dir, real_transforms, 0.8)
记录一些类型参数
class_names_dict = total_data.class_to_idx
print(total_data.class_to_idx)
{‘Bananaquit’: 0, ‘Black Skimmer’: 1, ‘Black Throated Bushtiti’: 2, ‘Cockatoo’: 3}
常理上如果要找到对应预测类型的名称可能对换下更加方便
class_names_dict = dict(zip(class_names_dict.values(), class_names_dict.keys()))
N_classes = len(class_names_dict)
print(class_names_dict)
{0: ‘Bananaquit’, 1: ‘Black Skimmer’, 2: ‘Black Throated Bushtiti’, 3: ‘Cockatoo’}
3 残差网络
3.1 残差网络解决了什么
残差网络是为了解决神经网络隐藏层过多时,而引起的网络退化问题。退化(degradation)问题是指:当网络隐藏层变多时,网络的准确度达到饱和然后急剧退化,而且这个退化不是由于过拟合引起的。
拓展: 深度神经网络的“两朵乌云”
-
梯度弥散/爆炸
简单来讲就是网络太深了,会导致模型训练难以收敛。这个问题可以被标准初始化和中间层正规化的方法有效控制。(现阶段知道这么一回事就好了) -
网络退化
随着网络深度增加,网络的表现先是逐渐增加至饱和,然后迅速下降,这个退化不是由于过拟合引起的。
3.2 ResNet-50 介绍
ResNet-50有两个基本的块,分别名为Conv Block 和 Identity Block
[图片]
4 构建 ResNet-50 网络模型
class Resnet50_Model(nn.Module):
def init(self):
super(Resnet50_Model, self).init()
self.in_channels = 3
self.layers = [2, 3, 5, 2]
# ============= 基础层
# 方法1
self.cov0 = nn.Conv2d(self.in_channels, out_channels=64, kernel_size=7, stride=2, padding=3)
self.bn0 = nn.BatchNorm2d(num_features=64)
self.relu0 = nn.ReLU(inplace=False)
self.maxpool0 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)self.basic_layer = nn.Sequential( self.cov0, self.bn0, self.relu0, self.maxpool0 ) self.layer1 = nn.Sequential( ConvBlock(64, 3, [64, 64, 256], 1), IdentityBlock(256, 3, [64, 64, 256]), IdentityBlock(256, 3, [64, 64, 256]), ) self.layer2 = nn.Sequential( ConvBlock(256, 3, [128, 128, 512]), IdentityBlock(512, 3, [128, 128, 512]), IdentityBlock(512, 3, [128, 128, 512]), IdentityBlock(512, 3, [128, 128, 512]), ) self.layer3 = nn.Sequential( ConvBlock(512, 3, [256, 256, 1024]), IdentityBlock(1024, 3, [256, 256, 1024]), IdentityBlock(1024, 3, [256, 256, 1024]), IdentityBlock(1024, 3, [256, 256, 1024]), IdentityBlock(1024, 3, [256, 256, 1024]), IdentityBlock(1024, 3, [256, 256, 1024]), ) self.layer4 = nn.Sequential( ConvBlock(1024, 3, [512, 512, 2048]), IdentityBlock(2048, 3, [512, 512, 2048]), IdentityBlock(2048, 3, [512, 512, 2048]), ) # 输出网络 self.avgpool = nn.AvgPool2d((7, 7)) # classfication layer # 7*7均值后2048个参数 self.fc = nn.Sequential(nn.Linear(2048, N_classes), nn.Softmax(dim=1))
def forward(self, x):
x = self.basic_layer(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.fc(x) return x
5 训练模型
5.1 训练函数和测试函数
def train_and_test(model, loss_func, optimizer, epochs=25):
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
summary.summary(model, (3, 224, 224))
record = []
best_acc = 0.0
best_epoch = 0
for epoch in range(epochs):#训练epochs轮
epoch_start = time.time()
print("Epoch: {}/{}".format(epoch + 1, epochs))
model.train()#训练
train_loss = 0.0
train_acc = 0.0
valid_loss = 0.0
valid_acc = 0.0
for i, (inputs, labels) in enumerate(train_datasets):
inputs = inputs.to(device)
labels = labels.to(device)
#print(labels)
# 记得清零
optimizer.zero_grad()
outputs = model(inputs)
loss = loss_func(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item() * inputs.size(0)
if i%10==0:
print("train data: {:01d} / {:03d} outputs: {}".format(i, len(train_datasets), outputs.data[0]))
ret, predictions = torch.max(outputs.data, 1)
correct_counts = predictions.eq(labels.data.view_as(predictions))
acc = torch.mean(correct_counts.type(torch.FloatTensor))
train_acc += acc.item() * inputs.size(0)
with torch.no_grad():
model.eval()#验证
for j, (inputs, labels) in enumerate(test_datasets):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
loss = loss_func(outputs, labels)
valid_loss += loss.item() * inputs.size(0)
if j%10==0:
print("val data: {:01d} / {:03d} outputs: {}".format(j, len(test_datasets), outputs.data[0]))
ret, predictions = torch.max(outputs.data, 1)
correct_counts = predictions.eq(labels.data.view_as(predictions))
acc = torch.mean(correct_counts.type(torch.FloatTensor))
valid_acc += acc.item() * inputs.size(0)
avg_train_loss = train_loss / train_datas_size
avg_train_acc = train_acc / train_data_size
avg_valid_loss = valid_loss / test_data_size
avg_valid_acc = valid_acc / test_data_size
record.append([avg_train_loss, avg_valid_loss, avg_train_acc, avg_valid_acc])
if avg_valid_acc > best_acc :#记录最高准确性的模型
best_acc = avg_valid_acc
best_epoch = epoch + 1
epoch_end = time.time()
print("Epoch: {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}%, \n\t\tValidation: Loss: {:.4f}, Accuracy: {:.4f}%, Time: {:.4f}s".format(
epoch + 1, avg_valid_loss, avg_train_acc * 100, avg_valid_loss, avg_valid_acc * 100,
epoch_end - epoch_start))
print("Best Accuracy for validation : {:.4f} at epoch {:03d}".format(best_acc, best_epoch))
return model, record
5.2 测试代码
if name==‘main’:
epochs = 25
model = Resnet50_Model()
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),lr=0.0001)
model, record = train_and_test(model, loss_func, optimizer, epochs)
torch.save(model, './Best_Resnet50.pth')
record = np.array(record)
plt.plot(record[:, 0:2])
plt.legend(['Train Loss', 'Valid Loss'])
plt.xlabel('Epoch Number')
plt.ylabel('Loss')
plt.ylim(0, 1.5)
plt.savefig('Loss.png')
plt.show()
plt.plot(record[:, 2:4])
plt.legend(['Train Accuracy', 'Valid Accuracy'])
plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.ylim(0, 1)
plt.savefig('Accuracy.png')
plt.show()