Pytorch图像分类：05使用PyTorch搭建ResNet模型

心之所向h

已于 2024-08-18 21:16:37 修改

阅读量571

点赞数 30

分类专栏： Pytorch图像分类文章标签： pytorch 分类人工智能深度学习

于 2024-08-12 10:07:22 首次发布

本文链接：https://blog.csdn.net/qq_54695558/article/details/140999825

版权

Pytorch图像分类专栏收录该内容

5 篇文章 0 订阅

订阅专栏

【简介】：基于flower_data使用PyTorch搭建ResNet模型进行图片分类
【参考】：6.1 ResNet网络结构，BN以及迁移学习详解_哔哩哔哩_bilibili
ResNet网络结构详解与模型的搭建_resnet模型结构-CSDN博客
【代码完整版】：05ResNet(github.com)

注：本人还在学习初期，此文是为了梳理自己所学整理的，有些说法是自己的理解，不一定对，如有差错，请批评指正！

一、基础知识

1.背景

ResNet在2015年由微软实验室提出，斩获当年ImageNet竞赛中分类任务、目标检测、图像分割第一名。

2.网络的亮点

网络中的亮点包括:

实现了超深的网络结构（突破1000层）
提出了残差网络
使用Batch Normalization加速训练（丢弃dropout）

（1）如何理解ResNet实现了超深的网络结构

在ResNet网络提出之前，传统的卷积神经网络都是通过将一系列卷积层与下采样层进行堆叠得到的。但是当堆叠到一定网络深度时，就会出现两个问题。①梯度消失或梯度爆炸。 ②退化问题(degradation problem)。在ResNet论文中说通过数据的预处理以及在网络中使用BN（Batch Normalization）层能够解决梯度消失或者梯度爆炸问题。但是对于退化问题（随着网络层数的加深，效果还会变差，如下图左边）并没有很好的解决办法。

所以ResNet论文提出了residual结构（残差结构）来减轻退化问题。下图右边是使用residual结构的卷积网络，可以看到随着网络的不断加深，效果并没有变差，反而变的更好了。
在这里插入图片描述

（2）Residual结构

下图是论文中给出的两种残差结构。左边的残差结构是针对层数较少网络，例如ResNet18层和ResNet34层网络。右边是针对网络层数较多的网络，例如ResNet101，ResNet152等。为什么深层网络要使用右侧的残差结构呢。因为，右侧的残差结构能够减少网络参数与运算量。同样输入、输出一个channel为256的特征矩阵，如果使用左侧的残差结构需要大约1170648个参数，但如果使用右侧的残差结构只需要69632个参数。明显搭建深层网络时，使用右侧的残差结构更合适。
在这里插入图片描述
对于左侧的残差结构，它的主分支是由两层3x3的卷积层组成，它右侧的连接线是shortcut分支也称捷径分支，这里需要注意的是，最后主分支上的输出矩阵是与shortcut分支上的输出矩阵进行相加，所以这两个输出特征矩阵要有相同的shape（CxHxW）。

如果刚刚仔细观察了ResNet34网络结构图的同学，应该能够发现图中会有一些虚线的残差结构。在原论文中作者只是简单说了这些虚线残差结构有降维的作用，并在捷径分支上通过1x1的卷积核进行降维处理。而下图右侧给出了详细的虚线残差结构，注意下每个卷积层的步距stride，以及捷径分支上的卷积核的个数（与主分支上的卷积核个数相同）。

对于右侧的残差结构，主分支使用了三个卷积层，第一个是1x1的卷积层用来压缩channel维度，第二个是3x3的卷积层，第三个是1x1的卷积层用来还原channel维度（注意主分支上第一层卷积层和第二层卷积层所使用的卷积核个数是相同的，第三层是第一层的4倍）。

该残差结构所对应的虚线残差结构如下图右侧所示，同样在捷径分支上有一层1x1的卷积层，它的卷积核个数与主分支上的第三层卷积层卷积核个数相同，注意每个卷积层的步距。（注意：原论文中，在下图右侧虚线残差结构的主分支中，第一个1x1卷积层的步距是2，第二个3x3卷积层步距是1。但在pytorch官方实现过程中是第一个1x1卷积层的步距是1，第二个3x3卷积层步距是2，这么做的好处是能够在top1上提升大概0.5%的准确率。

下面这幅图是原论文给出的不同深度的ResNet网络结构配置，注意表中的残差结构给出了主分支上卷积核的大小与卷积核个数，表中的xN表示将该残差结构重复N次。那到底哪些残差结构是虚线残差结构呢。
在这里插入图片描述

对于ResNet18/34/50/101/152，表中conv3_x, conv4_x, conv5_x所对应的一系列残差结构的第一层残差结构都是虚线残差结构。因为这一系列残差结构的第一层都有调整输入特征矩阵shape的使命（将特征矩阵的高和宽缩减为原来的一半，将深度channel调整成下一层残差结构所需要的channel）。
在这里插入图片描述
对于ResNet50/101/152，其实在conv2_x所对应的一系列残差结构的第一层也是虚线残差结构。因为它需要调整输入特征矩阵的channel，根据表格可知通过3x3的max pool之后输出的特征矩阵shape应该是[56, 56, 64]，但我们conv2_x所对应的一系列残差结构中的实线残差结构它们期望的输入特征矩阵shape是[56, 56, 256]（因为这样才能保证输入输出特征矩阵shape相同，才能将捷径分支的输出与主分支的输出进行相加）。所以第一层残差结构需要将shape从[56, 56, 64] --> [56, 56, 256]。注意，这里只调整channel维度，高和宽不变（而conv3_x, conv4_x, conv5_x所对应的一系列残差结构的第一层虚线残差结构不仅要调整channel还要将高和宽缩减为原来的一半）。

（3）Batch Normalizatiion

在这里插入图片描述
详情请看：Batch Normalization详解以及pytorch实验_pytorch batch normalization-CSDN博客

（4）迁移学习

在这里插入图片描述

二、搭建模型

1.搭建ResNet模型

1）搭建`Basicblock模块`

它是用于ResNet18和ResNet34的残差结构：3x3conv-3x3conv

# resnet18 resnet34
class BasicBlock(nn.Module):
    expansion=1 #主分支上的卷积核的个数是否有变化
    def __init__(self,in_channel,out_channel,stride=1,downsample=None,**kwargs):
        super(BasicBlock, self).__init__()
        self.conv1=nn.Conv2d(in_channels=in_channel,out_channels=out_channel,
                             kernel_size=3,stride=stride,padding=1,bias=False)
        self.bn1=nn.BatchNorm2d(out_channel)
        self.relu=nn.ReLU()
        self.conv2=nn.Conv2d(in_channels=out_channel,out_channels=out_channel,
                             kernel_size=3,stride=1,padding=1,bias=False)
        self.bn2=nn.BatchNorm2d(out_channel)
        self.downsample=downsample

    def forward(self,x):
        identity=x #捷径分支上的输出值
        if self.downsample is not None:
            identity=self.downsample(x)

        out=self.conv1(x)
        out=self.bn1(out)
        out=self.relu(out)

        out=self.conv2(out)
        out=self.bn2(out)

        out+=identity
        out=self.relu(out)
        return out

2）搭建`Bottleneck模块`

它是用于ResNet50、ResNet101和ResNet152的残差结构：3x3conv-3x3conv

# ResNet-50/101/152
class Bottleneck(nn.Module):
    expansion = 4  # 第三个卷积层的个数 是第一个和第二个卷积层 的4倍
    def __init__(self,in_channel,out_channel,stride=1,downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1=nn.Conv2d(in_channels=in_channel,out_channels=out_channel,
                             kernel_size=1,stride=stride,bias=False)

        self.bn1=nn.BatchNorm2d(out_channel)
        #----------------------------------
        self.conv2=nn.Conv2d(in_channels=out_channel,out_channels=out_channel,
                             kernel_size=3,stride=1,bias=False)
        self.bn2=nn.BatchNorm2d(out_channel)
        #----------------------------------
        self.conv3=nn.Conv2d(in_channels=out_channel,out_channels=out_channel * self.expansion,
                             kernel_size=1,stride=1,bias=False)
        self.bn3=nn.BatchNorm2d(out_channel*self.expansion)
        #----------------------------------

        self.relu=nn.ReLU(inplace=True)
        self.downsample=downsample

    def forward(self,x):
        indentity=x
        if self.downsample is not None:
            identity=self.downsample(x)
        out=self.conv1(x)
        out=self.bn1(out)
        out=self.conv2(out)
        out=self.bn2(out)
        out=self.conv3(out)
        out=self.bn3(out)

        out+=indentity
        out=self.relu(out)

        return out

3）搭建`ResNet`模块

有了上面的基础结构，现在就可以构造一个resnet网络模型模板

class ResNet(nn.Module):
    def __init__(self,block,blocks_num,num_classes=1000,include_top=True):#blocks_num表示每一层残差结构的个数，如resnet34是：3，4，6，3
        super(ResNet, self).__init__()
        self.include_top=include_top #about 在resnet的基础上可以扩展网络
        self.in_channel=64

        #conv1
        self.conv1=nn.Conv2d(in_channels=3,out_channels=self.in_channel,
                             kernel_size=7,stride=2,padding=3,bias=False)
        self.bn1=nn.BatchNorm2d(self.in_channel)
        self.relu=nn.ReLU(inplace=True)
        self.maxpool=nn.MaxPool2d(kernel_size=3,stride=2)

        #reduail
        self.layer1=self._make_layer(block,64,blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1],stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2],stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3],stride=2)

        if self.include_top:
            self.avgpool=nn.AdaptiveAvgPool2d((1,1))
            self.fc=nn.Linear(512*block.expansion,num_classes)#最后一个残差的输出，对于resnet18和resnet56,都是512，大的都是2048=512*4

        #对卷积层进行一个初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self,block,channel,block_num,stride=1):
        downsample=None

        if stride !=1 or self.in_channel !=channel*block.expansion:#下采样，18、56不需要
            downsample=nn.Sequential(
                nn.Conv2d(self.in_channel,channel*block.expansion,kernel_size=1,stride=stride,bias=False),
                nn.BatchNorm2d(channel*block.expansion)
            )
        layers=[]
        layers.append(block(self.in_channel,channel,downsample=downsample,stride=stride))
        self.in_channel=channel*block.expansion

        for _ in range(1,block_num):#第一层已搭建，所以这里从1开始
            layers.append(block(self.in_channel,channel))

        return nn.Sequential(*layers)
    def forward(self,x):
        x=self.conv1(x)
        x=self.bn1(x)
        x=self.relu(x)
        x=self.maxpool(x)

        x=self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x=self.avgpool(x)
            x=torch.flatten(x,1)
            x=self.fc(x)
        return x

4）构造resnet18/34/50/101/152

def resnet18(num_classes=1000,include_top=True):
    return ResNet(BasicBlock,[2,2,2,2],num_classes=num_classes,include_top=include_top)
def resnet34(num_classes=1000,include_top=True):
    return ResNet(BasicBlock,[3,4,6,3],num_classes=num_classes,include_top=include_top)
def resnet50(num_classes=1000,include_top=True):
    return ResNet(Bottleneck,[3,4,6,3],num_classes=num_classes,include_top=include_top)
def resnet101(num_classes=1000,include_top=True):
    return ResNet(Bottleneck,[3,4,23,3],num_classes=num_classes,include_top=include_top)
def resnet50(num_classes=1000,include_top=True):
    return ResNet(Bottleneck,[3,8,36,3],num_classes=num_classes,include_top=include_top)

至此，网络模型搭建完成

2.训练

和之前训练AlexNet、VGG、GoogLeNet的步骤差不多，只需要修改少部分内容，如下：

1）下载预训练权重
去torchvision.models里面找

from torchvision.models import resnet

鼠标点击resnet进入，找到resnet34，这里给出了下载链接：https://download.pytorch.org/models/resnet34-b627a593.pth
在这里插入图片描述
下载之后改名为resnet34-pre.pth
2）transform
改为官方提供的教程上面的标准化方法：

data_transform = {
    "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(),
                                 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
    "val": transforms.Compose([transforms.Resize(256),
                               transforms.CenterCrop(224),
                               transforms.ToTensor(),
                               transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}

3）定义模型并载入模型权重

#定义模型
net=resnet34()
net=net.to(device)
#载入模型权重
model_weight_path="./resnet34-pre.pth"
assert os.path.exists(model_weight_path),"file {} does not exists".format(model_weight_path)
net.load_state_dict(torch.load(model_weight_path,map_location=device))

还有一个问题，就是这里载入的模型，它的分类种类为1000，而我们自己的数据集只有5个类，所以这里需要修改全连接层的结构

in_channel=net.fc.in_features#输入特征矩阵的深度
net.fc=nn.Linear(in_channel,5)

如果不想使用迁移学习，就只用定义模型的那两行代码就行

在这里插入图片描述
一个epoch就达到了90%的正确率，可见迁移学习的厉害！

3.预测

和之前训练AlexNet、VGG、GoogLeNet的步骤差不多，只需要修改少部分内容，如下：
1）transform

    data_transform = transforms.Compose([
        transforms.Resize(256),  # 原图片的长宽比固定不变，将其最短边长缩放为256
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

2）load model

# create model
    model = resnet34(num_classes=5).to(device)

    # load model weights
    weights_path = "./ResNet.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    missing_keys, unexpected_keys = model.load_state_dict(torch.load(weights_path, map_location=device),
                                                          strict=False)

输出>:

class: daisy        prob: 6.81e-05
class: dandelion    prob: 1.74e-05
class: roses        prob: 1.63e-05
class: sunflowers   prob: 5.68e-05
class: tulips       prob: 1.0

在这里插入图片描述

心之所向h

关注

30
点赞
踩
25

收藏

觉得还不错? 一键收藏
0
评论
Pytorch图像分类：05使用PyTorch搭建ResNet模型

【简介】：基于flower_data使用PyTorch搭建ResNet模型进行图片分类【参考】：6.1 ResNet网络结构，BN以及迁移学习详解_哔哩哔哩_bilibili ResNet网络结构详解与模型的搭建_resnet模型结构-CSDN博客【代码完整版】：05ResNet(github.com)注：本人还在学习初期，此文是为了梳理自己所学整理的，有些说法是自己的理解，不一定对，如有差错，请批评指正！ResNet在2015年由微软实验室提出，斩获当年ImageNet竞赛中分类任务、
复制链接

扫一扫