深度学习第三周学习总结

逸__

已于 2022-07-27 22:35:06 修改

阅读量1.7k

点赞数 2

文章标签：学习

于 2022-07-27 21:54:07 首次发布

本文链接：https://blog.csdn.net/wlj0725/article/details/125982924

版权

一、ResNet

1、ResNet介绍

ResNet在2015年由微软实验室提出，斩获当年ImageNet竞赛中分类任务第一名，目标检测第一名。

通过堆叠卷积层和池化层来增加网络的深度，这样的网络性能是否就能相对浅层网络更优秀？在《Deep Residual Learning for Image Recognition》这篇论文中给出了答案。

上图截取自原论文，可以看到。56层的网络不管是训练误差还是测试误差都要高于20层的网络。论文作者给出了两种解释：一是梯度消失或梯度爆炸。随着网络层数的加深，假设层与层之间的误差梯度小于（大于）1，那在反向传播的过程中，梯度会越来越小（大），这就导致了梯度消失（爆炸）从而降低了网络的性能。可以通过数据标准化处理、权重初始化、batch normalization解决。batch normalization将一批数据（一个batch）的每一个通道标准化为均值为0，方差为一的分布。详见博文。二是退化问题，在解决了第一个问题后，仍然无法完全解决上述问题，作者便提出了一个残差结构来解决递归问题。下图是两种残差结构，其中左边的残差结构主要用于层数较浅的网络，而右边的残差结构则主要用于层数较深的网络。

可以看到残差结构将输入与输出进行了相加，这就要求输入与输出具有相同的shape。经计算可得右边的残差结构的参数小于左边。这也使得它可以应用到更深的网络中。

论文共给出了18层、34层、50层、101层、152层五个层数网络的结构

34层网络的详细结构如下。

标注实线的残差结构的输入与输出的shape完全一样可以直接相加。而标注虚线的残差结构输入与输出shape不同。需要通过卷积核的个数进行特征降维或升维、设置特定卷积核的大小、步长来改变特征图的高和宽。

2、用Pytorch搭建Resnet

1.2.1两种残差块：

在18层和34层的网络中，每个残差块的输入通道数与输出通道数相同，而在剩下三个层数的网络中，每个残差块的输出通道数是输入通道数的四倍。两种残差块的定义如下，通过expansion调整输出通道数。

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

class Bottleneck(nn.Module):

    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(Bottleneck, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=outchannel,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=outchannel, out_channels=outchannel
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(outchannel)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=outchannel, out_channels=out_channel*self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

1.2.2搭建网络

原始图像的通道数为3，经过卷积，batch normalization，relu、最大值池化后作为Conv2.x的输入。layer1、layer2、layer3、layer4分别对应Conv2.x，Conv3.x，Conv4.x，Conv5.x._make_layer函数中，第一个参数是上面两个block中的一个，在18层和34层的网络中为basicblock，而在其余三个网络中则为Bottleneck。第二个参数是Conv2.x，Conv3.x，Conv4.x，Conv5.x第一个残差块第一个卷积层中卷积核的个数，第三个参数是Conv2.x，Conv3.x，Conv4.x，Conv5.x残差块的个数，分别为3,4,6,3。以50层网络为例，在搭建layer1即（Conv2.x）时，由于expansion=4，故执行if语句下的命令，定义了一个downsample函数，它实现了将输入的通道数增加为与输出通道数相同，从而可以进行矩阵相加操作，这是layer1的三个残差组中的第一个，将它压入layers列表中，之后将剩余两个残差组压入列表，它们不再需要改变通道数。然后搭建layer2（即Conv3.x），layer3和layer4。他们中的某些组或许需要改变特征图的高宽和通道数。

class ResNet(nn.Module):

    def __init__(self,
                 block,
                 blocks_num,
                 num_classes=1000,
                 include_top=True):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64
        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel,
                            channel,
                            downsample=downsample,
                            stride=stride))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x

1.2.3使用迁移学习训练网络

迁移学习是在别人已经训练好的模型基础上训练自己的模型，实验表明，它可以更快地达到一个理想的效果。常见的迁移学习方式有以下几种：1、载入权重后训练所有参数。2、载入权重后只训练最后几层参数。3、载入权重后在原网络的基础上再添加一层全连接层，仅训练最后一个全连接层。

在Resnet网络的训练中，可以在官网下载预训练好的模型，由于它的训练数据集是ImageNet，共有1000个类，如果要迁移到自己的实际任务当中，可以修改全连接层的参数数目，在训练最后一层全连接层即可。需要注意在测试时，要对数据进行与训练时相同的预处理。

修改全连接层参数代码如下，假设分类任务共有五类。下述代码重新定义了34层Resnet网络中的全连接层。

    net = resnet34()
    # load pretrain weights
    # download url: https://download.pytorch.org/models/resnet34-333f7ec4.pth
    model_weight_path = "./resnet34-pre.pth"
    assert os.path.exists(model_weight_path), "file {} does not exist.".format(model_weight_path)
    net.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
    # for param in net.parameters():
    #     param.requires_grad = False

    # change fc layer structure
    in_channel = net.fc.in_features
    net.fc = nn.Linear(in_channel, 5)
    net.to(device)

二、ResNext网络

1、ResNext改进之处

论文将ResNet网络中第二种残差块通过分组卷积的方法进行了改进。分组卷积将输入特征图的通道分为多个group，对每一个group进行卷积，再对结果进行拼接。新的残差块将通道分为32个group。提高了模型准确率。

下图是分组卷积的计算量，g表示group的个数。分组卷积有效减少了计算量。

三、基于Lenet网络结构的猫狗图像分类

1、Lenet网络结构

Lenet网络结构如下：转自https://zhuanlan.zhihu.com/p/116181964

2、pytorch代码实现

3.2.1数据集加载

import torch
import torch.nn as nn
import torchvision
from torchvision import models,transforms,datasets
import torch.nn.functional as F
from PIL import Image
import torch.optim as optim
import os
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using gpu: %s ' % torch.cuda.is_available())
train_path = './train/'
test_path = './test/'

def get_data(file_path):
    file_lst = os.listdir(file_path) #获得所有文件名称 xxxx.jpg
    data_lst = []
    for i in range(len(file_lst)):
        clas = file_lst[i][:3] #cat和dog在文件名的开头
        img_path = os.path.join(file_path,file_lst[i])#将文件名与路径合并得到完整路径，以备读取
        if clas == 'cat':
            data_lst.append((img_path, 0))
        else:
            data_lst.append((img_path, 1))
    return data_lst
class catdog_set(torch.utils.data.Dataset):
    def __init__(self, path, transform):
        super(catdog_set).__init__()
        self.data_lst = get_data(path)#调用刚才的函数获得数据列表
        self.trans = torchvision.transforms.Compose(transform)
    def __len__(self):
        return len(self.data_lst)
    def __getitem__(self,index):
        (img,cls) = self.data_lst[index]
        image = self.trans(Image.open(img))
        label = torch.tensor(cls,dtype=torch.float32)
        return image,label
# 将输入图像缩放为 128*128，每一个 batch 中图像数量为128
# 训练时，每一个 epoch 随机打乱图像的顺序，以实现样本多样化
train_loader = torch.utils.data.DataLoader(
    catdog_set(train_path, [transforms.Resize((128,128)),transforms.ToTensor()]),
    batch_size=128, shuffle=True)

训练集20000张图片（猫10000张，狗10000张）测试集2000张图片数据集下载地址与代码放在同一目录下。get_data函数返回一个列表，是参数文件夹下每一张图片的路径和标签。[('./train/cat_0.jpg', 0), ('./train/cat_1.jpg', 0),.......('./train/dog_9999.jpg', 1)]

3.2.2网络结构

Lenet的pytorch实现如下，

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.Conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.Conv2 = nn.Conv2d(6, 16, 5)
        self.pool = nn.MaxPool2d(kernel_size=2)

        self.fc1 = nn.Linear(16*29*29,32)
        self.fc2 = nn.Linear(32,2)


    def forward(self, x):
        x = self.Conv1(x)
        print(x.shape)
        x = self.pool(x)
        print(x.shape)
        x = self.pool(self.Conv2(x))
        print(x.shape)

        x = torch.flatten(x, 1)
        print(x.shape)

        x = F.relu(self.fc1(x))

        x = self.fc2(x)

        return x

随机产生一个指定大小的张量

x = torch.randn(1, 3, 128, 128)
print(x.shape)
net = Net()
y = net(x)
print(y.shape)

输出如下：torch.Size([1, 3, 128, 128])
torch.Size([1, 6, 124, 124])
torch.Size([1, 6, 62, 62])
torch.Size([1, 16, 29, 29])
torch.Size([1, 13456])
torch.Size([1, 2])

说明网络接通，网络的输出是1×2的。

3.2.3网络训练

nn.CrossEntropyLoss()是交叉熵损失函数，用于解决多分类或二分类问题。它的输入是网络最后一层的输出，我们在forward函数中没有写softmax操作，原因在于该损失函数中对输入进行了softmax操作。交叉熵函数

net = Net().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
for epoch in range(30):  # 重复多轮训练
    for i, (inputs, labels) in enumerate(train_loader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # 优化器梯度归零
        optimizer.zero_grad()
        # 正向传播 +　反向传播 + 优化
        outputs = net(inputs)
        loss = criterion(outputs, labels.long())
        loss.backward()
        optimizer.step()
    print('Epoch: %d loss: %.6f' %(epoch + 1, loss.item()))
print('Finished Training')

训练结果如下：

Epoch: 1 loss: 0.698980
Epoch: 2 loss: 0.482175
Epoch: 3 loss: 0.496106
Epoch: 4 loss: 0.491504
Epoch: 5 loss: 0.340280
Epoch: 6 loss: 0.421208
Epoch: 7 loss: 0.494740
Epoch: 8 loss: 0.276336
Epoch: 9 loss: 0.195770
Epoch: 10 loss: 0.157310
Epoch: 11 loss: 0.052218
Epoch: 12 loss: 0.055619
Epoch: 13 loss: 0.014557
Epoch: 14 loss: 0.010108
Epoch: 15 loss: 0.004856
Epoch: 16 loss: 0.007189
Epoch: 17 loss: 0.005779
Epoch: 18 loss: 0.108815
Epoch: 19 loss: 0.038461
Epoch: 20 loss: 0.057754
Epoch: 21 loss: 0.010165
Epoch: 22 loss: 0.001001
Epoch: 23 loss: 0.003251
Epoch: 24 loss: 0.000153
Epoch: 25 loss: 0.001171
Epoch: 26 loss: 0.000920
Epoch: 27 loss: 0.001027
Epoch: 28 loss: 0.000189
Epoch: 29 loss: 0.000538
Epoch: 30 loss: 0.000089

3.2.4测试并提交.csv文件

resfile = open('Lenet.csv', 'w')
for i in range(0,2000):
    img_PIL = Image.open('./test/'+str(i)+'.jpg')
    img_tensor = transforms.Compose([transforms.Resize((128,128)),transforms.ToTensor()])(img_PIL)
    img_tensor = img_tensor.reshape(-1, img_tensor.shape[0], img_tensor.shape[1], img_tensor.shape[2])
    img_tensor = img_tensor.to(device)
    out = net(img_tensor).cpu().detach().numpy()
    if out[0, 0] < out[0, 1]:
        resfile.write(str(i)+','+str(1)+'\n')
    else:
        resfile.write(str(i)+','+str(0)+'\n')
resfile.close()

四、基于Resnet的猫狗大战

使用迁移学习的方法训练一个猫狗图像分类的网络，这里只训练网络的全连接层，模型的定义依次定义Resnet的残差块，这里采用了34层的网络，所以用到的残差块是basicblock。

import torch.nn as nn
import torch


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    """
    注意：原论文中，在虚线残差结构的主分支上，第一个1x1卷积层的步距是2，第二个3x3卷积层步距是1。
    但在pytorch官方实现过程中是第一个1x1卷积层的步距是1，第二个3x3卷积层步距是2，
    这么做的好处是能够在top1上提升大概0.5%的准确率。
    可参考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self,
                 block,
                 blocks_num,
                 num_classes=1000,
                 include_top=True,
                 groups=1,
                 width_per_group=64):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel,
                            channel,
                            downsample=downsample,
                            stride=stride,
                            groups=self.groups,
                            width_per_group=self.width_per_group))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel,
                                groups=self.groups,
                                width_per_group=self.width_per_group))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


def resnet34(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet50(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)


def resnext50_32x4d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
    groups = 32
    width_per_group = 4
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
    groups = 32
    width_per_group = 8
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)

实例化网络，并修改最后一层全连接层。

net = resnet34()
# load pretrain weights
# download url: https://download.pytorch.org/models/resnet34-333f7ec4.pth
model_weight_path = "./resnet34-pre.pth"
assert os.path.exists(model_weight_path), "file {} does not exist.".format(model_weight_path)
net.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
# for param in net.parameters():
#     param.requires_grad = False

# change fc layer structure
in_channel = net.fc.in_features
net.fc = nn.Linear(in_channel, 2)

训练了3个epoch：

using 20000 images for training, 2000 images for validation.
train epoch[1/3] loss:0.062: 100%|██████████| 1250/1250 [07:29<00:00, 2.78it/s]
valid epoch[1/3]: 100%|██████████| 125/125 [00:24<00:00, 5.12it/s]
[epoch 1] train_loss: 0.176 val_accuracy: 0.979
train epoch[2/3] loss:0.149: 100%|██████████| 1250/1250 [06:19<00:00, 3.29it/s]
valid epoch[2/3]: 100%|██████████| 125/125 [00:19<00:00, 6.48it/s]
[epoch 2] train_loss: 0.145 val_accuracy: 0.979
train epoch[3/3] loss:0.046: 100%|██████████| 1250/1250 [06:13<00:00, 3.34it/s]
valid epoch[3/3]: 100%|██████████| 125/125 [00:20<00:00, 6.06it/s]
[epoch 3] train_loss: 0.126 val_accuracy: 0.981
Finished Training

提交后得分如下：

五、思考题

1、Residual learning

残差学习将输出与输入按元素相加。当输出的维度与输出的维度不同时，需要通过shortcut将输入转化为与输出的维度相同。残差学习有效解决了随着模型深度的增加模型效果反而下降的问题。
2、Batch Normailization 的原理

batch normalization将一批数据（一个batch）的每一个通道标准化为均值为0，方差为一的分布。要计算出整个训练集的feature map然后在进行标准化处理，对于一个大型的数据集明显是不可能的，所以是Batch Normalization，也就是我们计算一个Batch数据的feature map然后再进行标准化。
3、为什么分组卷积可以提升准确率？即然分组卷积可以提升准确率，同时还能降低计算量，分数数量尽量多不行吗？

分组卷积在降低计算量的同时提升准确率。但随着分组数目增多，通道间的关联被忽略，人们提出注意力机制来关注通道间的相关性。