ResNet50 复现笔记(pytorch 版本)

目录

 

1.resnet 简述

2.网络结构

3.训练模型


1.resnet 简述

Resnet是残差网络(Residual Network)的缩写,该系列网络广泛用于目标分类等领域以及作为计算机视觉任务主干经典神经网络的一部分,典型的网络有resnet50, resnet101等。Resnet网络证明网络能够向更深(包含更多隐藏层)的方向发展。

原文链接:https://arxiv.org/abs/1512.03385

2.网络结构

网络结构如图,resnet50分为conv1、conv2_x、conv3_x、conv4_x、conv5_x 共5大层。至于为什么叫resnet50 则可以再次细分,1+1+3*3+4*3+6*3+3*3=50(前面一层卷积+一层池化+4组卷积  不考虑最后面的全连接、池化层)。

下图详细表示了具体到每个层数值变化,可能看不太清。但是放大看还是可以的。定义4个stage,每个stage下面有3个block。

block原文定义了两种:一是identity block,二是conv block,这两种block的区别如下。

conv block 在分支上有卷积操作,进而可以改变该block的输出通道数,如下图stage2的第一步所示。

identity block 在分支上无操作,该block的输入输出通道数相同,如下图stage2第二步、第三步所示。

resnet50的网络模型就是上图所示,难点是如何通过程序实现上述网络模型。

3.训练模型

定义block类,包含conv block和identity block。

# Block的各个plane的值:
# inplanes:输入block之前的通道数
# planes:在block中间处理的时候的通道数(这个值是输出维度的1/4)
# planes * block.expansion:输出的维度
class Bottleneck(nn.Module):
    expansion = 4#

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False)  # block的输入层卷积
        self.bn1 = nn.BatchNorm2d(planes)  # 归一化处理,使得不会因数据过大而导致网络性能的不稳定
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1,  
                               padding=1, bias=False)#block的中间层卷积
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)#block的输出层卷积
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample#判断是否是conv block
        self.stride = stride#不同stage的stride不同,除了stage1的stride为1,其余stage均为2

    def forward(self, x):
        residual = x
        # 卷积操作,就是指的是identity block
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)
        # 用downsample判断是否直连,如果的identity block就是直连。是conv block就需要对残差边进行卷积
        if self.downsample is not None:  # downsample 函数后面有定义
            residual = self.downsample(x)
        # 相加
        out += residual
        out = self.relu(out)

        return out

定义resnet 类:

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):  # block即为Bottleneck模型,layers可控制传入的Bottleneck
        self.inplanes = 64  # 初始输入通道数为64
        super(ResNet, self).__init__()  # 可见ResNet也是nn.Module的子类
        # 把stage前面的卷积处理
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True)

        # 64,128,256,512是指扩大4倍之前的维度
        # 四层stage,layer表示有几个block块,可见后3个stage的stride全部为2
        self.layer1 = self._make_layer(block, 64, layers[0]) 
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        #最后的池化与全连接
        self.avgpool = nn.AvgPool2d(7)  # 这里默认stride为7
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

        # _make_layer方法用来构建ResNet网络中的4个blocks
        # 第一个输入是block是Bottleneck类
        # 第二个输入是blocks输出的channels
        # 第三个输入是每个blocks中包含多少个residual子结构,相当于layers
    def _make_layer(self, block, planes, blocks, stride=1):  
        # downsample 主要用来处理H(x)=F(x)+x中F(x)和x的channel维度不匹配问题,即对残差结构的输入进行升维,在做残差相加的时候,必须保证残差的纬度与真正的输出维度(宽、高、以及深度)相同
        # self.inplanes为上个box_block的输出channel,planes为当前box_block块的输入channel
        downsample = None
        # stride不为1时,残差结构输出纬度变化
        # 输入通道数不等于输出通道数,也需要downsample,即block旁边的支路需要进行卷积
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )
        #conv block部分
        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))  # 将每个blocks的第一个residual结构保存在layers列表中
        self.inplanes = planes * block.expansion#得到第一个 conv block的输出,作为identity的输入

        #identity block部分
        for i in range(1, blocks):  # 该部分是将每个blocks的剩下residual结构保存在layers列表中,这样就完成了一个blocks的构造
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        # 前面部分的卷积,不是layer的卷积
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        # 四个层
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)  # 将输出结果展成一行
        x = self.fc(x)

        return x

主函数部分:

def resnet50():
    model = ResNet(Bottleneck, [3, 4, 6, 3])
    # 获取特征提取部分
    features = list([model.conv1, model.bn1, model.relu, model.maxpool, model.layer1, model.layer2, model.layer3])
    # 获取分类部分
    classifier = list([model.layer4, model.avgpool])
    features = nn.Sequential(*features)
    classifier = nn.Sequential(*classifier)
    return features, classifier

 

  • 5
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
引用中提到的Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization的复现采用了resnet50预训练网络,下面是一个简单的resnet50代码复现的步骤: 1.导入必要的库和模块: ``` import torch import torch.nn as nn import torch.optim as optim import torchvision.transforms as transforms import torchvision.datasets as datasets from torch.utils.data import DataLoader ``` 2.定义ResNet50模型: ``` class ResNet50(nn.Module): def __init__(self, num_classes=1000): super(ResNet50, self).__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(64, 3) self.layer2 = self._make_layer(128, 4, stride=2) self.layer3 = self._make_layer(256, 6, stride=2) self.layer4 = self._make_layer(512, 3, stride=2) self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) self.fc = nn.Linear(512 * 1 * 1, num_classes) def _make_layer(self, planes, blocks, stride=1): downsample = None if stride != 1 or self.inplanes != planes * 4: downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * 4, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * 4), ) layers = [] layers.append(Bottleneck(self.inplanes, planes, stride, downsample)) self.inplanes = planes * 4 for _ in range(1, blocks): layers.append(Bottleneck(self.inplanes, planes)) return nn.Sequential(*layers) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = x.view(x.size(0), -1) x = self.fc(x) return x ``` 3.定义Bottleneck模块: ``` class Bottleneck(nn.Module): expansion = 4 def __init__(self, inplanes, planes, stride=1, downsample=None): super(Bottleneck, self).__init__() self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) self.bn1 = nn.BatchNorm2d(planes) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(planes) self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) self.bn3 = nn.BatchNorm2d(planes * 4) self.relu = nn.ReLU(inplace=True) self.downsample = downsample self.stride = stride def forward(self, x): identity = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) out = self.relu(out) out = self.conv3(out) out = self.bn3(out) if self.downsample is not None: identity = self.downsample(x) out += identity out = self.relu(out) return out ``` 4.定义数据预处理: ``` transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) ``` 5.加载数据集: ``` train_dataset = datasets.ImageFolder('path/to/train', transform=transform) train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) val_dataset = datasets.ImageFolder('path/to/val', transform=transform) val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False) ``` 6.定义损失函数和优化器: ``` criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4) ``` 7.训练模型: ``` for epoch in range(num_epochs): for i, (images, labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) outputs = model(images) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() if (i+1) % 100 == 0: print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' .format(epoch+1, num_epochs, i+1, total_step, loss.item())) ``` 8.测试模型: ``` with torch.no_grad(): correct = 0 total = 0 for images, labels in val_loader: images = images.to(device) labels = labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total)) ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值