计算机视觉之-ImageNet、Alexnet、VGGNet、RestNet模型结构

一、ImageNet

  • ImageNet 数据集一共包含了 14,197,122 张图片,共 1000 个类别;
  • 一般在图像处理领域,喜欢用ImageNet来做网络的预训练,主要有两点:
    • 一方面ImageNet是图像领域里有很多事先标注好训练数据的数据集合,数据量很大,量越大训练出的参数越靠谱;
    • 另外一方面因为ImageNet有1000类,类别多,算是通用的图像数据,跟领域没太大关系,所以通用性好,预训练完后哪哪都能用。

二、AlexNet网络

2.1 AlexNet原理

  • AlexNet网络为8层深度网络,其中5层卷积层和3层全连接层,不计LRN层和池化层。
    在这里插入图片描述
    在这里插入图片描述

在这里插入图片描述

  • AlexNet每层的超参数如下图所示,其中输入尺寸为227227,第一个卷积使用较大的核尺寸1111,步长为4,有96个卷积核;紧接着一层LRN层;然后是最大池化层,核为33,步长为2。这之后的卷积层的核尺寸较小,55或33,并且步长为1,即扫描全图所有像素;而最大池化层依然为33,步长为2。我们可以发现,前几个卷积层的计算量很大,但参数量很小,只占Alexnet总参数的很小一部分。这就是卷积层的优点!通过较小的参数量来提取有效的特征。实验表明去掉任何一个卷积层,都会使网络的分类性能大幅下降。
    在这里插入图片描述

2.2 AlexNet实现

import torch
import torch.nn as nn
from .utils import load_state_dict_from_url
from typing import Any


__all__ = ['AlexNet', 'alexnet']


model_urls = {
    'alexnet': 'https://download.pytorch.org/models/alexnet-owt-4df8aa71.pth',
}


class AlexNet(nn.Module):

    def __init__(self, num_classes: int = 1000) -> None:
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x


def alexnet(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> AlexNet:
    r"""AlexNet model architecture from the
    `"One weird trick..." <https://arxiv.org/abs/1404.5997>`_ paper.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr如果为True,则显示下载到stderr的进度条
    """
    model = AlexNet(num_classes=num_classes, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['alexnet'],
                                              progress=progress)
        state_dict.popitem("classifier.6.weight")
        state_dict.popitem("classifier.6.bias")
        model.load_state_dict(state_dict, strict=False)
    return model

三、VGGNet网络

3.1 参考论文

3.2 VGGNet原理

  • 论文里对多种不同深度的网络进行了测试,分别称为为A-E网络,从11-19层,其中D和E被称为VGG16和VGG19。各网络结构如下:
    在这里插入图片描述
  • 来一个VGG16的立体图:
    在这里插入图片描述
  • VGGNet将AlexNet中的大卷积核都替换为小的卷积核,使用的卷积核size=3x3,stride=1。因为两个3x3的卷积叠加等价于一个5x5的卷积,3个3x3的卷积叠加等价于一个7x7的卷积叠加。
  • 小卷积核替换大卷积核的优点:
    本来只有一个非线性层,替换后增加到3个,这增加了网络的深度和非线性,有利于决策函数辨别。
    减少了参数数量,本来有7x7=49,减少到3x3x3=27,这可以看做是对7x7卷积滤波器进行正则化,迫使他们分解为3x3滤波器。
  • 网络C里面加入了1x1的卷积核,这是在不影响感受野的情况下增加决策函数的非线性的方法。输入通道和输出通道相同,因此是一个线性映射,激活函数的存在引入了非线性。

3.3 VGGNet实现

import torch
import torch.nn as nn
from .utils import load_state_dict_from_url
from typing import Union, List, Dict, Any, cast


__all__ = [
    'VGG', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn',
    'vgg19_bn', 'vgg19',
]


model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
    'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
    'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
    'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
    'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}


class VGG(nn.Module):

    def __init__(
        self,
        features: nn.Module,
        num_classes: int = 1000,
        init_weights: bool = True
    ) -> None:
        super(VGG, self).__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self) -> None:
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)


def make_layers(cfg: List[Union[str, int]], batch_norm: bool = False) -> nn.Sequential:
    layers: List[nn.Module] = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            v = cast(int, v)
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)


cfgs: Dict[str, List[Union[str, int]]] = {
    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


def _vgg(arch: str, cfg: str, batch_norm: bool, pretrained: bool, progress: bool, num_classes: int, **kwargs: Any) -> VGG:
    if pretrained:
        kwargs['init_weights'] = False
    model = VGG(make_layers(cfgs[cfg], batch_norm=batch_norm), num_classes=num_classes, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
        state_dict.popitem("classifier.6.weight")
        state_dict.popitem("classifier.6.bias")
        model.load_state_dict(state_dict, strict=False)
    return model


def vgg11(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 11-layer model (configuration "A") from
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11', 'A', False, pretrained, progress, num_classes=num_classes, **kwargs)


def vgg11_bn(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 11-layer model (configuration "A") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg11_bn', 'A', True, pretrained, progress, num_classes, **kwargs)


def vgg13(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 13-layer model (configuration "B")
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg13', 'B', False, pretrained, progress, num_classes, **kwargs)


def vgg13_bn(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 13-layer model (configuration "B") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg13_bn', 'B', True, pretrained, progress, num_classes, **kwargs)


def vgg16(pretrained: bool = False, progress: bool = True, num_classes:int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 16-layer model (configuration "D")
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg16', 'D', False, pretrained, progress, num_classes, **kwargs)


def vgg16_bn(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 16-layer model (configuration "D") with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg16_bn', 'D', True, pretrained, progress, num_classes, **kwargs)


def vgg19(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 19-layer model (configuration "E")
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg19', 'E', False, pretrained, progress, num_classes, **kwargs)


def vgg19_bn(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> VGG:
    r"""VGG 19-layer model (configuration 'E') with batch normalization
    `"Very Deep Convolutional Networks For Large-Scale Image Recognition" <https://arxiv.org/pdf/1409.1556.pdf>`._

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vgg('vgg19_bn', 'E', True, pretrained, progress, num_classes, **kwargs)

四、ResNet网络

4.1 参考论文

4.2 深度网络退化问题

  • 实验发现深度网络出现了退化问题(Degradation problem):网络深度增加时,网络准确度出现饱和,甚至出现下降。这个现象可以在下图直观看出来:56层的网络比20层网络效果还要差。这不会是过拟合问题,因为56层网络的训练误差同样高。我们知道深层网络存在着梯度消失或者爆炸的问题,这使得深度学习模型很难训练。但是现在已经存在一些技术手段如BatchNorm来缓解这个问题。因此,出现深度网络的退化问题是非常令人诧异的。
    在这里插入图片描述

4.3 ResNet原理

  • 假设目前在一个已有的浅层网络中,通过增加新的网络层来建立更深的网络,一个极端的情况是新增加的网络不做任何学习,仅仅复制浅层网络的特征,即这样新层是恒等映射(Identity mapping)。在这种情况下深层网络至少和浅层网络性能一样,但现实情况下更深层的网络会出现退化的现象。
  • 对于一个叠加序列,当输入 x x x后,学习到的特征记为 H ( x ) H(x) H(x),我们希望可以学习残差 F ( x ) = H ( x ) − x F(x)=H(x)-x F(x)=H(x)x,这样原始的学习特征是 F ( x ) + x F(x)+x F(x)+x。当残差为0时,此时叠加层仅仅做恒等映射,至少网络性能不会下降,实际上残差不会为0,这使得叠加层在输入特征基础上学习到新的特征,从而拥有更好的性能,残差学习结构如下图所示,类似于短路连接(shortcut connection):
    在这里插入图片描述
  • 为什么残差学习会比较容易:从直观上看残差学习需要学习的内容少,因为残差一般会比较少,学习难度小点,残差单元可表示为。
    y l = x l + F ( x l , W l ) (1) y_l=x_l+F(x_l,W_l)\tag{1} yl=xl+F(xl,Wl)(1)
    x l + 1 = f ( y l ) (2) x_{l+1}=f(y_l)\tag{2} xl+1=f(yl)(2)
    其中 x l x_l xl x l + 1 x_{l+1} xl+1分别表示第 l l l个残差单元的输入和输出,每一个残差单元一般包含多层结构, F F F是残差函数,表示学习到的残差, f f f是ReLU激活函数,基于上式,求得浅层 l l l到深层 L L L的学习特征为:
    x L = x l + ∑ i = l L − 1 F ( x i , W i ) (3) x_L=x_l+\sum_{i=l}^{L-1}F(x_i,W_i)\tag{3} xL=xl+i=lL1F(xi,Wi)(3)
  • 利用链式运算,得到反向传播梯度为:
    ∂ l o s s ∂ x l = ∂ l o s s ∂ x L . ∂ x L ∂ x l = ∂ l o s s ∂ x L . ( 1 + ∂ ∑ i = l L − 1 F ( x i , W i ) ∂ x l ) (4) \frac{\partial loss}{\partial x_l}=\frac{\partial loss}{\partial x_L}.\frac{\partial x_L}{\partial x_l}=\frac{\partial loss}{\partial x_L}.(1+\frac{\partial \sum_{i=l}^{L-1}F(x_i,W_i)}{\partial x_l})\tag{4} xlloss=xLloss.xlxL=xLloss.(1+xli=lL1F(xi,Wi))(4)
    上式中 ∂ l o s s ∂ x L \frac{\partial loss}{\partial x_L} xLloss表示损失函数到达 L L L层的梯度,下括号中的1表示短路机制可以无损的传播梯度,而另外一项残差梯度则需要经过带有weights的层,梯度不是直接传递过来的。残差梯度不会那么巧全为-1,而且就算其比较小,有1的存在也不会导致梯度消失。所以残差学习会更容易。
  • ResNet网络是参考了VGG19网络,在其基础上进行了修改,并通过短路机制加入了残差单元
  • 如下图所示。变化主要体现在ResNet直接使用stride=2的卷积做下采样,并且用global average pool层替换了全连接层。
  • ResNet的一个重要设计原则是:
    1)当feature map大小降低一半时,feature map的数量增加一倍,这保持了网络层的复杂度;
    2)从下图中可以看到,ResNet相比普通网络每两层间增加了短路机制,这就形成了残差学习,其中虚线表示feature map数量发生了改变
    3)下图展示的18-layer的ResNet,还可以构建更深的网络如下表所示。从表中可以看到,对于18-layer和34-layer的ResNet,其进行的两层间的残差学习,当网络更深时,其进行的是三层间的残差学习,三层卷积核分别是1x1,3x3和1x1;
    4)一个值得注意的是隐含层的feature map数量是比较小的,并且是输出feature map数量的1/4。
    在这里插入图片描述在这里插入图片描述
  • 下面我们再分析一下残差单元,ResNet使用两种残差单元,如下图所示。左图对应的是浅层网络的残差结构,而右图对应的是深层网络的残差结果。
    1)对于短路连接,当输入和输出维度一致时,可以直接将输入加到输出上;
    2)但是当维度不一致时(对应的是维度增加一倍),这就不能直接相加。有两种策略:
    • 采用零填充(zero-padding)增加维度,此时一般要先做一个下采样(downsamp),可以采用strde=2的pooling,这样不会增加参数;
    • 采用新的映射(projection shortcut),一般采用1x1的卷积,这样会增加参数,也会增加计算量。短路连接除了直接使用恒等映射,当然都可以采用projection shortcut。
      在这里插入图片描述

4.4 ResNet实现

import torch
from torch import Tensor
import torch.nn as nn
from .utils import load_state_dict_from_url
from typing import Type, Any, Callable, Union, List, Optional


__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
           'resnet152', 'resnext50_32x4d', 'resnext101_32x8d',
           'wide_resnet50_2', 'wide_resnet101_2']

model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
    'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
    'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
    'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
    'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}


def conv3x3(in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1) -> nn.Conv2d:
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, groups=groups, bias=False, dilation=dilation)


def conv1x1(in_planes: int, out_planes: int, stride: int = 1) -> nn.Conv2d:
    """1x1 convolution"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion: int = 1

    def __init__(
        self,
        inplanes: int,
        planes: int,
        stride: int = 1,
        downsample: Optional[nn.Module] = None,
        groups: int = 1,
        base_width: int = 64,
        dilation: int = 1,
        norm_layer: Optional[Callable[..., nn.Module]] = None
    ) -> None:
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    # Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
    # while original implementation places the stride at the first 1x1 convolution(self.conv1)
    # according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
    # This variant is also known as ResNet V1.5 and improves accuracy according to
    # https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.

    expansion: int = 4

    def __init__(
        self,
        inplanes: int,
        planes: int,
        stride: int = 1,
        downsample: Optional[nn.Module] = None,
        groups: int = 1,
        base_width: int = 64,
        dilation: int = 1,
        norm_layer: Optional[Callable[..., nn.Module]] = None
    ) -> None:
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv1x1(inplanes, width)
        self.bn1 = norm_layer(width)
        self.conv2 = conv3x3(width, width, stride, groups, dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = conv1x1(width, planes * self.expansion)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x: Tensor) -> Tensor:
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(
        self,
        block: Type[Union[BasicBlock, Bottleneck]],
        layers: List[int],
        num_classes: int = 1000,
        zero_init_residual: bool = False,
        groups: int = 1,
        width_per_group: int = 64,
        replace_stride_with_dilation: Optional[List[bool]] = None,
        norm_layer: Optional[Callable[..., nn.Module]] = None
    ) -> None:
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]

    def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]], planes: int, blocks: int,
                    stride: int = 1, dilate: bool = False) -> nn.Sequential:
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def _forward_impl(self, x: Tensor) -> Tensor:
        # See note [TorchScript super()]
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def _resnet(
    arch: str,
    block: Type[Union[BasicBlock, Bottleneck]],
    layers: List[int],
    pretrained: bool,
    progress: bool,
    num_classes: int,
    **kwargs: Any
) -> ResNet:
    model = ResNet(block, layers, num_classes=num_classes, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
        state_dict.popitem("fc.weight")
        state_dict.popitem("fc.bias")
        model.load_state_dict(state_dict, strict=False)
    return model


def resnet18(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNet-18 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress, num_classes,
                   **kwargs)


def resnet34(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNet-34 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress, num_classes,
                   **kwargs)


def resnet50(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNet-50 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress, num_classes
                   **kwargs)


def resnet101(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNet-101 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress, num_classes,
                   **kwargs)


def resnet152(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNet-152 model from
    `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, progress, num_classes
                   **kwargs)


def resnext50_32x4d(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNeXt-50 32x4d model from
    `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    kwargs['groups'] = 32
    kwargs['width_per_group'] = 4
    return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3],
                   pretrained, progress, num_classes, **kwargs)


def resnext101_32x8d(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""ResNeXt-101 32x8d model from
    `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    kwargs['groups'] = 32
    kwargs['width_per_group'] = 8
    return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],
                   pretrained, progress, num_classes, **kwargs)


def wide_resnet50_2(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""Wide ResNet-50-2 model from
    `"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_.

    The model is the same as ResNet except for the bottleneck number of channels
    which is twice larger in every block. The number of channels in outer 1x1
    convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
    channels, and in Wide ResNet-50-2 has 2048-1024-2048.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    kwargs['width_per_group'] = 64 * 2
    return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3],
                   pretrained, progress, num_classes, **kwargs)


def wide_resnet101_2(pretrained: bool = False, progress: bool = True, num_classes: int = 1000, **kwargs: Any) -> ResNet:
    r"""Wide ResNet-101-2 model from
    `"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_.

    The model is the same as ResNet except for the bottleneck number of channels
    which is twice larger in every block. The number of channels in outer 1x1
    convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
    channels, and in Wide ResNet-50-2 has 2048-1024-2048.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    kwargs['width_per_group'] = 64 * 2
    return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3],
                   pretrained, progress, num_classes, **kwargs)

源码参考:https://blog.csdn.net/supergxt/article/details/121955782

### 回答1: Imagenet-vgg-verydeep-19.mat是一个预训练的深度神经网络模型文件,包含了一个19层的卷积神经网络(CNN模型,在计算机视觉领域中非常有用。它被称为VGG-19,因为它由两个重复的卷积层阶段组成,每个阶段包含了4个卷积层和2个池化层,加上3个全连接层。此模型是由牛津大学计算机科学系Visual Geometry Group团队开发的,用于2014年ImageNet图像分类竞赛中取得了第二名的成绩。 下载Imagenet-vgg-verydeep-19.mat模型文件可以方便地使用它进行迁移学习和特征提取,将已经训练好的模型用于类似的计算机视觉任务,例如图像分类、物体检测、图像分割等。在许多研究领域,它已经成为使用深度学习进行计算机视觉最常使用的模型之一。 需要注意的是,Imagenet-vgg-verydeep-19.mat是一个很大的文件(约几百MB),下载它可能需要一些耐心和时间,特别是在网络环境较为缓慢的情况下。此外,该模型是使用MATLAB语言编写的,因此如果你想在其他编程语言中使用该模型,需要进行一些额外的工作来将其转化为其他语言所能识别的格式。 ### 回答2: imagenet-vgg-verydeep-19.mat是一个神经网络模型,它是基于VGG网络架构的一个深度神经网络。它是在2014年ILSVRC比赛中,由Visual Geometry Group (VGG)的研究人员提出的一种高效的CNN模型,该模型在“image classification”(图像分类)任务上的表现相当惊人,打破了当时的记录。它在准确性和速度方面表现出色,因此它得到了广泛的应用,成为深度学习领域的研究者和开发者们常用的模型之一。 imagenet-vgg-verydeep-19.mat是该模型的一个预训练权重文件,其中包含了30多万个图像的标识符和与之相应的特征描述符。这些权重可用于快速训练您自己的图片分类器或其他深度学习任务,这比从头开始训练一个完整的神经网络要快得多。您也可以使用这些权重来对一些图像进行分类,并使用它们的特征描述符来进行特征提取和图像检索。 如果您想要使用imagenet-vgg-verydeep-19.mat文件,您需要先下载它并存储到您的本地计算机中。在MATLAB中,您可以使用以下命令来下载该文件: ``` urlwrite('http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-19.mat', 'imagenet-vgg-verydeep-19.mat'); ``` 下载完成后,您可以将其导入到MATLAB环境中,使用它进行图像分类和特征提取。该文件的大小约为500MB,因此请确保您的计算机具有足够的存储空间和足够的RAM来使用它。 ### 回答3: Imagenet-vgg-verydeep-19.mat是一个预训练的深度神经网络的模型文件,可以用来在计算机视觉领域进行图像分类、目标检测等诸多任务。该模型主要基于VGG网络结构,是一种具有较好性能和广泛应用的深度卷积神经网络。 下载Imagenet-vgg-verydeep-19.mat文件可以帮助研究人员或开发人员更快地开发和实现计算机视觉的应用程序。在某些应用场景下,为了实现对图像的识别或分类,需要大量的数据和计算资源。使用预训练的模型可以节省很多时间和计算资源,同时也可以提高模型的准确度。 目前,Imagenet-vgg-verydeep-19.mat模型已经被广泛应用于图像分类、目标检测和语义分割等领域。可以作为图像识别算法的基础模型,进行相应的改进和优化,从而得到更高的精度和更好的效果。 需要注意的是,使用Imagenet-vgg-verydeep-19.mat文件时,需要具备一定的深度学习算法和编程技能,否则很难实现相关应用。同时,也需要具备一定的数据处理能力,针对不同的应用场景,对数据进行适当的预处理和增强,才能得到更优的模型效果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值