第5周学习：ShuffleNet & EfficientNet & 迁移学习

_warrior_

已于 2022-08-11 12:21:09 修改

阅读量358

点赞数

文章标签：学习迁移学习人工智能

于 2022-08-11 11:58:09 首次发布

本文链接：https://blog.csdn.net/walkinginq/article/details/126279071

版权

1 论文阅读与学习

1.1 ShuffleNet V1 & V2

ShuffleNet V1
使用分组1x1卷积，每个卷积核只处理一部分的输入通道，可以降低计算量。
通道重排实现跨通道的信息交流。将channel等分并重新组合成新的channel，实现信息交互。
在这里插入图片描述
重排步骤：

reshape成g行n列的矩阵
Transpose转置
flatten

网络模型

(a)以depthwise convolution为骨干。
(b)以pointwise group和channel shuffle为骨干。
©下采样，采用concat连接方法。

ShuffleNet V2

MobileNet和ShuffleNet分别部署在CPU和ARM上得到的测试结果：

FLOPS仅反映卷积层，是衡量一个网络的间接指标，而非直接指标。
不同硬件上的测试结果不同。
数据读写MAC占用影响大。
Element-wise逐元素操作带来的开销不可忽略。

四条轻量化网络的设计原则：
输入输出通道相同时，MAC最小。
对于轻量级CNN网络，常采用深度可分割卷积（depthwise separable convolutions），其中点卷积（ pointwise convolution）即1x1卷积复杂度最大。假定输入和输出特征的通道数分别为 c1 和 c2 ，特征图的空间大小为 hw，那么1x1卷积的FLOPs为 B=c1c2hw 。对应的MAC为hw(c1+c2)+c1c2。

根据均值不等式
分组数过大的分组会增加MAC
变量g为不同的分组数，GPU的FLOPS相同，分组数越多，推理越慢，MAC越大。
碎片化操作不利于并行加速
series为串行操作，parallel为并行操作。在AlexNet、MobileNet、Xception中运用了并行操作。碎片化操作越多，推理越慢，特别对于GPU并行设备、小型网络。
逐元素操作带来的内存耗时不可忽略
逐元素操作包括：add逐元素相加，Relu激活函数等操作。

改进的网络模型
不分组
采用channle split
取消channel shuffle
concat连接

在这里插入图片描述
(a)ShuffleNet V1基本模型
(b)ShuffleNet V1下采样模型
©ShuffleNet V2 基本模型
(d)ShuffleNet V2下采样模型

Pytorch搭建ShuffleNetV2网络

import torch
from torch import Tensor
import torch.nn as nn


def channel_shuffle(x: Tensor, groups: int) -> Tensor:

    batch_size, num_channels, height, width = x.size()
    channels_per_group = num_channels // groups

    # reshape
    # [batch_size, num_channels, height, width] -> [batch_size, groups, channels_per_group, height, width]
    x = x.view(batch_size, groups, channels_per_group, height, width)

    x = torch.transpose(x, 1, 2).contiguous()

    # flatten
    x = x.view(batch_size, -1, height, width)

    return x


class InvertedResidual(nn.Module):
    def __init__(self, input_c: int, output_c: int, stride: int):
        super(InvertedResidual, self).__init__()

        if stride not in [1, 2]:
            raise ValueError("illegal stride value.")
        self.stride = stride

        assert output_c % 2 == 0
        branch_features = output_c // 2
        # 当stride为1时，input_channel应该是branch_features的两倍
        # python中 '<<' 是位运算，可理解为计算×2的快速方法
        assert (self.stride != 1) or (input_c == branch_features << 1)

        if self.stride == 2:
            self.branch1 = nn.Sequential(
                self.depthwise_conv(input_c, input_c, kernel_s=3, stride=self.stride, padding=1),
                nn.BatchNorm2d(input_c),
                nn.Conv2d(input_c, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(branch_features),
                nn.ReLU(inplace=True)
            )
        else:
            self.branch1 = nn.Sequential()

        self.branch2 = nn.Sequential(
            nn.Conv2d(input_c if self.stride > 1 else branch_features, branch_features, kernel_size=1,
                      stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True),
            self.depthwise_conv(branch_features, branch_features, kernel_s=3, stride=self.stride, padding=1),
            nn.BatchNorm2d(branch_features),
            nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True)
        )

    @staticmethod
    def depthwise_conv(input_c: int,
                       output_c: int,
                       kernel_s: int,
                       stride: int = 1,
                       padding: int = 0,
                       bias: bool = False) -> nn.Conv2d:
        return nn.Conv2d(in_channels=input_c, out_channels=output_c, kernel_size=kernel_s,
                         stride=stride, padding=padding, bias=bias, groups=input_c)

    def forward(self, x: Tensor) -> Tensor:
        if self.stride == 1:
            x1, x2 = x.chunk(2, dim=1)
            out = torch.cat((x1, self.branch2(x2)), dim=1)
        else:
            out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)

        out = channel_shuffle(out, 2)

        return out


class ShuffleNetV2(nn.Module):
    def __init__(self,
                 stages_repeats: List[int],
                 stages_out_channels: List[int],
                 num_classes: int = 1000,
                 inverted_residual: Callable[..., nn.Module] = InvertedResidual):
        super(ShuffleNetV2, self).__init__()

        if len(stages_repeats) != 3:
            raise ValueError("expected stages_repeats as list of 3 positive ints")
        if len(stages_out_channels) != 5:
            raise ValueError("expected stages_out_channels as list of 5 positive ints")
        self._stage_out_channels = stages_out_channels

        # input RGB image
        input_channels = 3
        output_channels = self._stage_out_channels[0]

        self.conv1 = nn.Sequential(
            nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(output_channels),
            nn.ReLU(inplace=True)
        )
        input_channels = output_channels

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Static annotations for mypy
        self.stage2: nn.Sequential
        self.stage3: nn.Sequential
        self.stage4: nn.Sequential

        stage_names = ["stage{}".format(i) for i in [2, 3, 4]]
        for name, repeats, output_channels in zip(stage_names, stages_repeats,
                                                  self._stage_out_channels[1:]):
            seq = [inverted_residual(input_channels, output_channels, 2)]
            for i in range(repeats - 1):
                seq.append(inverted_residual(output_channels, output_channels, 1))
            setattr(self, name, nn.Sequential(*seq))
            input_channels = output_channels

        output_channels = self._stage_out_channels[-1]
        self.conv5 = nn.Sequential(
            nn.Conv2d(input_channels, output_channels, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(output_channels),
            nn.ReLU(inplace=True)
        )

        self.fc = nn.Linear(output_channels, num_classes)

    def _forward_impl(self, x: Tensor) -> Tensor:
        # See note [TorchScript super()]
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.conv5(x)
        x = x.mean([2, 3])  # global pool
        x = self.fc(x)
        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def shufflenet_v2_x0_5(num_classes=1000):

    model = ShuffleNetV2(stages_repeats=[4, 8, 4],
                         stages_out_channels=[24, 48, 96, 192, 1024],
                         num_classes=num_classes)

    return model


def shufflenet_v2_x1_0(num_classes=1000):

    model = ShuffleNetV2(stages_repeats=[4, 8, 4],
                         stages_out_channels=[24, 116, 232, 464, 1024],
                         num_classes=num_classes)

    return model

训练网络
在这里插入图片描述
测试

1.2 EfficientNet V3

论文主要用NAS（Neural Architecture Search）技术来搜索网络的图像输入分辨率，网络的深度以及channel的宽度三个参数的合理化配置，同时来探索这三个参数的影响，将EfficientNet与其他网络的对比。
在这里插入图片描述
通过增加网络的width即增加卷积核的个数（增加特征矩阵的channels）如图(b)
通过增加网络的深度即使用更多的层结构如图©
通过增加输入网络的分辨率如图(d)
论文中会同时增加网络的width、网络的深度以及输入网络的分辨率如图(e)
在这里插入图片描述

增加网络的深度depth能够得到更加丰富、复杂的特征并且能够很好的应用到其它任务中。但网络的深度过深会面临梯度消失，训练困难的问题。
增加网络的宽度width能够获得更高细粒度的特征并且也更容易训练，但对于width很大而深度较浅的网络往往很难学习到更深层次的特征。
增加输入网络的图像分辨率能够潜在得获得更高细粒度的特征模板，但对于非常高的输入分辨率，准确率的增益也会减小。并且大分辨率图像会增加计算量。

作者通过 NAS技术搜索得到的EfficientNetB0的结构，整个网络框架由一系列Stage组成，Fi表示对应Stage的运算操作，Li表示在该Stage中重复Fi的次数。
在这里插入图片描述

为了探究d , r , w 这三个因子对最终准确率的影响，则将d , r , w 加入到公式中，我们可以得到抽象化后的优化问题。

作者又提出了一个混合缩放方法，使用一个混合因子ϕ 去统一的缩放width，depth，resolution参数。
在这里插入图片描述
网络结构
下表为EfficientNet-B0的网络框架（B1-B7在B0的基础上修改Resolution，Channels以及Layers）。
网络总共分成了9个Stage，第一个Stage一个卷积核大小为3x3步距为2的普通卷积层（包含BN和激活函数Swish），Stage2—Stage8都是在重复堆叠MBConv结构（最后一列的Layers表示该Stage重复MBConv结构多少次），而Stage9由一个普通的1x1的卷积层（包含BN和激活函数Swish），一个平均池化层和一个全连接层组成。表格中每个MBConv后会跟一个数字1或6，这里的1或6就是倍率因子n。
在这里插入图片描述
MBConv结构
MBConv结构主要由一个1x1的普通卷积（升维作用，包含BN和Swish），一个kxk的Depthwise Conv卷积（包含BN和Swish）k的具体值可看EfficientNet-B0的网络框架主要有3x3和5x5两种情况，一个SE模块，一个1x1的普通卷积（降维作用，包含BN），一个Droupout层构成。
在这里插入图片描述
EfficientNet与当时主流网络的性能参数对比：

Pytorch搭建EfficientNet 网络

import torch
import torch.nn as nn
from torch import Tensor
from torch.nn import functional as F


def _make_divisible(ch, divisor=8, min_ch=None):

    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


def drop_path(x, drop_prob: float = 0., training: bool = False):

    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()  # binarize
    output = x.div(keep_prob) * random_tensor
    return output


class DropPath(nn.Module):

    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)


class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.SiLU  # alias Swish  (torch>=1.7)

        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer())


class SqueezeExcitation(nn.Module):
    def __init__(self,
                 input_c: int,   # block input channel
                 expand_c: int,  # block expand channel
                 squeeze_factor: int = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = input_c // squeeze_factor
        self.fc1 = nn.Conv2d(expand_c, squeeze_c, 1)
        self.ac1 = nn.SiLU()  # alias Swish
        self.fc2 = nn.Conv2d(squeeze_c, expand_c, 1)
        self.ac2 = nn.Sigmoid()

    def forward(self, x: Tensor) -> Tensor:
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = self.ac1(scale)
        scale = self.fc2(scale)
        scale = self.ac2(scale)
        return scale * x


class InvertedResidualConfig:
    # kernel_size, in_channel, out_channel, exp_ratio, strides, use_SE, drop_connect_rate
    def __init__(self,
                 kernel: int,          # 3 or 5
                 input_c: int,
                 out_c: int,
                 expanded_ratio: int,  # 1 or 6
                 stride: int,          # 1 or 2
                 use_se: bool,         # True
                 drop_rate: float,
                 index: str,           # 1a, 2a, 2b, ...
                 width_coefficient: float):
        self.input_c = self.adjust_channels(input_c, width_coefficient)
        self.kernel = kernel
        self.expanded_c = self.input_c * expanded_ratio
        self.out_c = self.adjust_channels(out_c, width_coefficient)
        self.use_se = use_se
        self.stride = stride
        self.drop_rate = drop_rate
        self.index = index

    @staticmethod
    def adjust_channels(channels: int, width_coefficient: float):
        return _make_divisible(channels * width_coefficient, 8)


class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Callable[..., nn.Module]):
        super(InvertedResidual, self).__init__()

        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")

        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers = OrderedDict()
        activation_layer = nn.SiLU  # alias Swish

        # expand
        if cnf.expanded_c != cnf.input_c:
            layers.update({"expand_conv": ConvBNActivation(cnf.input_c,
                                                           cnf.expanded_c,
                                                           kernel_size=1,
                                                           norm_layer=norm_layer,
                                                           activation_layer=activation_layer)})

        # depthwise
        layers.update({"dwconv": ConvBNActivation(cnf.expanded_c,
                                                  cnf.expanded_c,
                                                  kernel_size=cnf.kernel,
                                                  stride=cnf.stride,
                                                  groups=cnf.expanded_c,
                                                  norm_layer=norm_layer,
                                                  activation_layer=activation_layer)})

        if cnf.use_se:
            layers.update({"se": SqueezeExcitation(cnf.input_c,
                                                   cnf.expanded_c)})

        # project
        layers.update({"project_conv": ConvBNActivation(cnf.expanded_c,
                                                        cnf.out_c,
                                                        kernel_size=1,
                                                        norm_layer=norm_layer,
                                                        activation_layer=nn.Identity)})

        self.block = nn.Sequential(layers)
        self.out_channels = cnf.out_c
        self.is_strided = cnf.stride > 1

        # 只有在使用shortcut连接时才使用dropout层
        if self.use_res_connect and cnf.drop_rate > 0:
            self.dropout = DropPath(cnf.drop_rate)
        else:
            self.dropout = nn.Identity()

    def forward(self, x: Tensor) -> Tensor:
        result = self.block(x)
        result = self.dropout(result)
        if self.use_res_connect:
            result += x

        return result


class EfficientNet(nn.Module):
    def __init__(self,
                 width_coefficient: float,
                 depth_coefficient: float,
                 num_classes: int = 1000,
                 dropout_rate: float = 0.2,
                 drop_connect_rate: float = 0.2,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None
                 ):
        super(EfficientNet, self).__init__()

        # kernel_size, in_channel, out_channel, exp_ratio, strides, use_SE, drop_connect_rate, repeats
        default_cnf = [[3, 32, 16, 1, 1, True, drop_connect_rate, 1],
                       [3, 16, 24, 6, 2, True, drop_connect_rate, 2],
                       [5, 24, 40, 6, 2, True, drop_connect_rate, 2],
                       [3, 40, 80, 6, 2, True, drop_connect_rate, 3],
                       [5, 80, 112, 6, 1, True, drop_connect_rate, 3],
                       [5, 112, 192, 6, 2, True, drop_connect_rate, 4],
                       [3, 192, 320, 6, 1, True, drop_connect_rate, 1]]

        def round_repeats(repeats):
            """Round number of repeats based on depth multiplier."""
            return int(math.ceil(depth_coefficient * repeats))

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=1e-3, momentum=0.1)

        adjust_channels = partial(InvertedResidualConfig.adjust_channels,
                                  width_coefficient=width_coefficient)

        # build inverted_residual_setting
        bneck_conf = partial(InvertedResidualConfig,
                             width_coefficient=width_coefficient)

        b = 0
        num_blocks = float(sum(round_repeats(i[-1]) for i in default_cnf))
        inverted_residual_setting = []
        for stage, args in enumerate(default_cnf):
            cnf = copy.copy(args)
            for i in range(round_repeats(cnf.pop(-1))):
                if i > 0:
                    # strides equal 1 except first cnf
                    cnf[-3] = 1  # strides
                    cnf[1] = cnf[2]  # input_channel equal output_channel

                cnf[-1] = args[-2] * b / num_blocks  # update dropout ratio
                index = str(stage + 1) + chr(i + 97)  # 1a, 2a, 2b, ...
                inverted_residual_setting.append(bneck_conf(*cnf, index))
                b += 1

        # create layers
        layers = OrderedDict()

        # first conv
        layers.update({"stem_conv": ConvBNActivation(in_planes=3,
                                                     out_planes=adjust_channels(32),
                                                     kernel_size=3,
                                                     stride=2,
                                                     norm_layer=norm_layer)})

        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.update({cnf.index: block(cnf, norm_layer)})

        # build top
        last_conv_input_c = inverted_residual_setting[-1].out_c
        last_conv_output_c = adjust_channels(1280)
        layers.update({"top": ConvBNActivation(in_planes=last_conv_input_c,
                                               out_planes=last_conv_output_c,
                                               kernel_size=1,
                                               norm_layer=norm_layer)})

        self.features = nn.Sequential(layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)

        classifier = []
        if dropout_rate > 0:
            classifier.append(nn.Dropout(p=dropout_rate, inplace=True))
        classifier.append(nn.Linear(last_conv_output_c, num_classes))
        self.classifier = nn.Sequential(*classifier)

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def efficientnet_b0(num_classes=1000):
    # input image size 224x224
    return EfficientNet(width_coefficient=1.0,
                        depth_coefficient=1.0,
                        dropout_rate=0.2,
                        num_classes=num_classes)


def efficientnet_b1(num_classes=1000):
    # input image size 240x240
    return EfficientNet(width_coefficient=1.0,
                        depth_coefficient=1.1,
                        dropout_rate=0.2,
                        num_classes=num_classes)


def efficientnet_b2(num_classes=1000):
    # input image size 260x260
    return EfficientNet(width_coefficient=1.1,
                        depth_coefficient=1.2,
                        dropout_rate=0.3,
                        num_classes=num_classes)


def efficientnet_b3(num_classes=1000):
    # input image size 300x300
    return EfficientNet(width_coefficient=1.2,
                        depth_coefficient=1.4,
                        dropout_rate=0.3,
                        num_classes=num_classes)


def efficientnet_b4(num_classes=1000):
    # input image size 380x380
    return EfficientNet(width_coefficient=1.4,
                        depth_coefficient=1.8,
                        dropout_rate=0.4,
                        num_classes=num_classes)


def efficientnet_b5(num_classes=1000):
    # input image size 456x456
    return EfficientNet(width_coefficient=1.6,
                        depth_coefficient=2.2,
                        dropout_rate=0.4,
                        num_classes=num_classes)


def efficientnet_b6(num_classes=1000):
    # input image size 528x528
    return EfficientNet(width_coefficient=1.8,
                        depth_coefficient=2.6,
                        dropout_rate=0.5,
                        num_classes=num_classes)


def efficientnet_b7(num_classes=1000):
    # input image size 600x600
    return EfficientNet(width_coefficient=2.0,
                        depth_coefficient=3.1,
                        dropout_rate=0.5,
                        num_classes=num_classes)

训练网络
在这里插入图片描述
测试

1.3 Transformer 中的 multi-head self-attention

Self-Attention
假设输入x1,x2，通过Input Embedding输入映射到a1,a2，接着通过三个变换矩阵Wq,Wk,Wv，得到对应的qi,ki,vi。
在这里插入图片描述

q代表query，后续会去和每一个k进行匹配
k代表key，后续会被每个q匹配
v代表从a中提取得到的信息
q和k匹配的过程可以理解成计算两者的相关性，相关性越大对应v的权重也就越大

在这里插入图片描述

Multi-Head Attention
Multi-Head Attention模块基本使用的还是self Attention模块，使用多头注意力机制能够联合来自不同head部分学习到的信息。

2 代码练习

2.1 VGG模型猫狗大战

import numpy as np
import matplotlib.pyplot as plt
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import models,transforms,datasets
import time
import json


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using gpu: %s ' % torch.cuda.is_available())

! wget http://fenggao-image.stor.sinaapp.com/dogscats.zip
! unzip dogscats.zip

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

vgg_format = transforms.Compose([
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                normalize,
            ])

data_dir = './dogscats'

dsets = {x: datasets.ImageFolder(os.path.join(data_dir, x), vgg_format)
         for x in ['train', 'valid']}

dset_sizes = {x: len(dsets[x]) for x in ['train', 'valid']}
dset_classes = dsets['train'].classes

loader_train = torch.utils.data.DataLoader(dsets['train'], batch_size=64, shuffle=True, num_workers=6)
loader_valid = torch.utils.data.DataLoader(dsets['valid'], batch_size=5, shuffle=False, num_workers=6)


'''
valid 数据一共有2000张图，每个batch是5张，因此，下面进行遍历一共会输出到 400
同时，把第一个 batch 保存到 inputs_try, labels_try，分别查看
'''
count = 1
for data in loader_valid:
    print(count, end='\n')
    if count == 1:
        inputs_try,labels_try = data
    count +=1

print(labels_try)
print(inputs_try.shape)

# 显示图片的小程序

def imshow(inp, title=None):
#   Imshow for Tensor.
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = np.clip(std * inp + mean, 0,1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated

在这里插入图片描述
创建 VGG Model

!wget https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json

model_vgg = models.vgg16(pretrained=True)

with open('./imagenet_class_index.json') as f:
    class_dict = json.load(f)
dic_imagenet = [class_dict[str(i)][1] for i in range(len(class_dict))]

inputs_try , labels_try = inputs_try.to(device), labels_try.to(device)
model_vgg = model_vgg.to(device)

outputs_try = model_vgg(inputs_try)

print(outputs_try)
print(outputs_try.shape)

'''
可以看到结果为5行，1000列的数据，每一列代表对每一种目标识别的结果。
但是我也可以观察到，结果非常奇葩，有负数，有正数，
为了将VGG网络输出的结果转化为对每一类的预测概率，我们把结果输入到 Softmax 函数
'''
m_softm = nn.Softmax(dim=1)
probs = m_softm(outputs_try)
vals_try,pred_try = torch.max(probs,dim=1)

print( 'prob sum: ', torch.sum(probs,1))
print( 'vals_try: ', vals_try)
print( 'pred_try: ', pred_try)

print([dic_imagenet[i] for i in pred_try.data])
imshow(torchvision.utils.make_grid(inputs_try.data.cpu()), 
       title=[dset_classes[x] for x in labels_try.data.cpu()])

修改最后一层，冻结前面层的参数
把最后的 nn.Linear 层由1000类，替换为2类。为了在训练中冻结前面层的参数，需要设置 required_grad=False。这样，反向传播训练梯度时，前面层的权重就不会自动更新了。训练中，只会更新最后一层的参数。

print(model_vgg)

model_vgg_new = model_vgg;

for param in model_vgg_new.parameters():
    param.requires_grad = False
model_vgg_new.classifier._modules['6'] = nn.Linear(4096, 2)
model_vgg_new.classifier._modules['7'] = torch.nn.LogSoftmax(dim = 1)

model_vgg_new = model_vgg_new.to(device)

print(model_vgg_new.classifier)

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=2, bias=True)
(7): LogSoftmax(dim=1)
)

'''
第一步：创建损失函数和优化器

损失函数 NLLLoss() 的 输入 是一个对数概率向量和一个目标标签. 
它不会为我们计算对数概率，适合最后一层是log_softmax()的网络. 
'''
criterion = nn.NLLLoss()

# 学习率
lr = 0.001

# 随机梯度下降
optimizer_vgg = torch.optim.SGD(model_vgg_new.classifier[6].parameters(),lr = lr)

'''
第二步：训练模型
'''

def train_model(model,dataloader,size,epochs=1,optimizer=None):
    model.train()
    
    for epoch in range(epochs):
        running_loss = 0.0
        running_corrects = 0
        count = 0
        for inputs,classes in dataloader:
            inputs = inputs.to(device)
            classes = classes.to(device)
            outputs = model(inputs)
            loss = criterion(outputs,classes)           
            optimizer = optimizer
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            _,preds = torch.max(outputs.data,1)
            # statistics
            running_loss += loss.data.item()
            running_corrects += torch.sum(preds == classes.data)
            count += len(inputs)
            print('Training: No. ', count, ' process ... total: ', size)
        epoch_loss = running_loss / size
        epoch_acc = running_corrects.data.item() / size
        print('Loss: {:.4f} Acc: {:.4f}'.format(
                     epoch_loss, epoch_acc))
        
        
# 模型训练
train_model(model_vgg_new,loader_train,size=dset_sizes['train'], epochs=1, 
            optimizer=optimizer_vgg)

Training: No. 1216 process … total: 1800
Training: No. 1280 process … total: 1800
Training: No. 1344 process … total: 1800
Training: No. 1408 process … total: 1800
Training: No. 1472 process … total: 1800
Training: No. 1536 process … total: 1800
Training: No. 1600 process … total: 1800
Training: No. 1664 process … total: 1800
Training: No. 1728 process … total: 1800
Training: No. 1792 process … total: 1800
Training: No. 1800 process … total: 1800
Loss: 0.0064 Acc: 0.8472

def test_model(model,dataloader,size):
    model.eval()
    predictions = np.zeros(size)
    all_classes = np.zeros(size)
    all_proba = np.zeros((size,2))
    i = 0
    running_loss = 0.0
    running_corrects = 0
    for inputs,classes in dataloader:
        inputs = inputs.to(device)
        classes = classes.to(device)
        outputs = model(inputs)
        loss = criterion(outputs,classes)           
        _,preds = torch.max(outputs.data,1)
        # statistics
        running_loss += loss.data.item()
        running_corrects += torch.sum(preds == classes.data)
        predictions[i:i+len(classes)] = preds.to('cpu').numpy()
        all_classes[i:i+len(classes)] = classes.to('cpu').numpy()
        all_proba[i:i+len(classes),:] = outputs.data.to('cpu').numpy()
        i += len(classes)
        print('Testing: No. ', i, ' process ... total: ', size)        
    epoch_loss = running_loss / size
    epoch_acc = running_corrects.data.item() / size
    print('Loss: {:.4f} Acc: {:.4f}'.format(
                     epoch_loss, epoch_acc))
    return predictions, all_proba, all_classes
  
predictions, all_proba, all_classes = test_model(model_vgg_new,loader_valid,size=dset_sizes['valid'])

Testing: No. 1970 process … total: 2000
Testing: No. 1975 process … total: 2000
Testing: No. 1980 process … total: 2000
Testing: No. 1985 process … total: 2000
Testing: No. 1990 process … total: 2000
Testing: No. 1995 process … total: 2000
Testing: No. 2000 process … total: 2000
Loss: 0.0473 Acc: 0.9495

2.2 迁移学习——AI艺术鉴赏

阅读亚军选手的方案，主干网络resnest200，输入448尺寸，在不同loss下取得5组最好效果，最后进行投票，得到最后分数。将训练好的网络输出层更改，使用softmax得到概率进行分类，网络的具体实现没有完全明白，下面是网络结构部分和loss评分代码。

# build a model
model =resnest200(pretrained=True)
model.avgpool = torch.nn.AdaptiveAvgPool2d(output_size=1)
model.fc = torch.nn.Linear(model.fc.in_features,49)
model = torch.nn.DataParallel(model).cuda()

def load_pre_cloth_model_dict(self, state_dict):
    own_state = self.state_dict()
    for name, param in state_dict.items():
        if name not in own_state:
            continue
        if 'fc' in name:
            continue
        if isinstance(param, nn.Parameter):
            # backwards compatibility for serialized parameters
            param = param.data
        own_state[name].copy_(param)

if use_pre_model:
    print('using pre model')
    pre_model_path = ''
    load_pre_cloth_model_dict(model, torch.load(pre_model_path)['state_dict'])

# optionally resume from a checkpoint
if resume:
    if os.path.isfile(resume):
        print("=> loading checkpoint '{}'".format(resume))
        checkpoint = torch.load(resume)
        start_epoch = checkpoint['epoch']
        best_score = checkpoint['best_score']
        stage = checkpoint['stage']
        lr = checkpoint['lr']
        model.load_state_dict(checkpoint['state_dict'])
        no_improved_times = checkpoint['no_improved_times']
        if no_improved_times == 0:
            model.load_state_dict(torch.load('./model/%s/model_best.pth.tar' % file_name)['state_dict'])
        print("=> loaded checkpoint (epoch {})".format(checkpoint['epoch']))
    else:
        print("=> no checkpoint found at '{}'".format(resume))

def validate(val_loader, model, criterion):
    batch_time = AverageMeter()
    # losses = AverageMeter()
    # acc = AverageMeter()

    # switch to evaluate mode
    model.eval()

    # 保存概率，用于评测
    val_imgs, val_preds, val_labels, = [], [], []

    end = time.time()
    for i, (images, labels, img_path) in enumerate(val_loader):
        # if len(labels) % workers == 1:
        #     images = images[:-1]
        #     labels = labels[:-1]
        image_var = torch.tensor(images, requires_grad=False).cuda(non_blocking=True)  # for pytorch 0.4
        # label_var = torch.tensor(labels, requires_grad=False).cuda(async=True)  # for pytorch 0.4
        target = torch.tensor(labels).cuda(non_blocking=True)

        # compute y_pred
        with torch.no_grad():
            y_pred = model(image_var)
            loss = criterion(y_pred, target)

        # measure accuracy and record loss
        # prec, PRED_COUNT = accuracy(y_pred.data, labels, topk=(1, 1))
        # losses.update(loss.item(), images.size(0))
        # acc.update(prec, PRED_COUNT)

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        if i % (print_freq * 5) == 0:
            print('TrainVal: [{0}/{1}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'.format(i, len(val_loader),
                                                                              batch_time=batch_time))

        # 保存概率，用于评测
        smax_out = smax(y_pred)
        val_imgs.extend(img_path)
        val_preds.extend([i.tolist() for i in smax_out])
        val_labels.extend([i.item() for i in labels])
    val_preds = [';'.join([str(j) for j in i]) for i in val_preds]
    val_score = pd.DataFrame({'img_path': val_imgs, 'preds': val_preds, 'label': val_labels,})
    val_score.to_csv('./result/%s/val_score.csv' % file_name, index=False)
    acc, f1  = score(val_score)
    print('acc: %.4f, f1: %.4f' % (acc, f1))
    print(' * Score {final_score:.4f}'.format(final_score=f1), '(Previous Best Score: %.4f)' % best_score)
    return acc, f1

def test(test_loader, model):
    csv_map = OrderedDict({'FileName': [], 'type': [], 'probability': []})
    # switch to evaluate mode
    model.eval()
    for i, (images, filepath) in enumerate(tqdm(test_loader)):
        # bs, ncrops, c, h, w = images.size()

        filepath = [str(i) for i in filepath]
        image_var = torch.tensor(images, requires_grad=False)  # for pytorch 0.4

        with torch.no_grad():
            y_pred = model(image_var)  # fuse batch size and ncrops
            # y_pred = y_pred.view(bs, ncrops, -1).mean(1) # avg over crops

            # get the index of the max log-probability
            smax = nn.Softmax()
            smax_out = smax(y_pred)
        csv_map['FileName'].extend(filepath)
        for output in smax_out:
            prob = ';'.join([str(i) for i in output.data.tolist()])
            csv_map['probability'].append(prob)
            csv_map['type'].append(np.argmax(output.data.tolist()))
        # print(len(csv_map['filename']), len(csv_map['probability']))

    result = pd.DataFrame(csv_map)
    result.to_csv('./result/%s/submission.csv' % file_name, index=False)
    result[['FileName','type']].to_csv('./result/%s/final_submission.csv' % file_name, index=False)
    return

def save_checkpoint(state, is_best, filename='./model/%s/checkpoint.pth.tar' % file_name):
    torch.save(state, filename)
    if is_best:
        shutil.copyfile(filename, './model/%s/model_best.pth.tar' % file_name)

class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

def adjust_learning_rate():
    nonlocal lr
    lr = lr / lr_decay
    return optim.Adam(model.parameters(), lr, weight_decay=weight_decay, amsgrad=True)

def accuracy(y_pred, y_actual, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    final_acc = 0
    maxk = max(topk)
    # for prob_threshold in np.arange(0, 1, 0.01):
    PRED_COUNT = y_actual.size(0)
    PRED_CORRECT_COUNT = 0

    prob, pred = y_pred.topk(maxk, 1, True, True)
    # prob = np.where(prob > prob_threshold, prob, 0)


    for j in range(pred.size(0)):
        if int(y_actual[j]) == int(pred[j]):
            PRED_CORRECT_COUNT += 1
    if PRED_COUNT == 0:
        final_acc = 0
    else:
        final_acc = PRED_CORRECT_COUNT / PRED_COUNT
    return final_acc * 100, PRED_COUNT

def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)

def doitf(tp, fp, fn):
    if (tp + fp == 0):
        return 0
    if (tp + fn == 0):
        return 0
    pre = float(1.0 * float(tp) / float(tp + fp))
    rec = float(1.0 * float(tp) / float(tp + fn))
    if (pre + rec == 0):
        return 0
    return (2 * pre * rec) / (pre + rec)

# 参数 samples_num 表示选取多少个样本来取平均
def score(val_score):
    val_score['preds'] = val_score['preds'].map(lambda x: [float(i) for i in x.split(';')])
    acc = 0
    tp = np.zeros(49)
    fp = np.zeros(49)
    fn = np.zeros(49)
    f1 = np.zeros(49)
    f1_tot = 0

    print(val_score.head(10))

    val_score['preds_label'] = val_score['preds'].apply(lambda x: np.argmax(x))
    for i in range(val_score.shape[0]):
        preds = val_score['preds_label'].iloc[i]
        label = val_score['label'].iloc[i]
        if (preds == label):
            acc = acc + 1
            tp[label] = tp[label] + 1
        else:
            fp[preds] = fp[preds] + 1
            fn[label] = fn[label] + 1
    
    for classes in range(49):
        f1[classes] = doitf(tp[classes], fp[classes], fn[classes])
        f1_tot = f1_tot + f1[classes]
    acc = acc / val_score.shape[0]
    f1_tot = f1_tot / 49

    return acc, f1_tot

# define loss function (criterion) and pptimizer
criterion = nn.CrossEntropyLoss().cuda()

# optimizer = optim.Adam(model.module.last_linear.parameters(), lr, weight_decay=weight_decay, amsgrad=True)
optimizer = optim.Adam(model.parameters(), lr, weight_decay=weight_decay, amsgrad=True)

if evaluate:
    validate(val_loader, model, criterion)
else:
    for epoch in range(start_epoch, total_epochs):
        if stage >= total_stages - 1:
            break
        # train for one epoch
        train(train_loader, model, criterion, optimizer, epoch)
        # evaluate on validation set
        if epoch >= 0:
            acc , f1 = validate(val_loader, model, criterion)

            with open('./result/%s.txt' % file_name, 'a') as acc_file:
                acc_file.write('Epoch: %2d, acc: %.8f, f1: %.8f\n' % (epoch, acc, f1))

            # remember best Accuracy and save checkpoint
            is_best = acc > best_score
            best_score = max(acc, best_score)

            # if (epoch + 1) in np.cumsum(stage_epochs)[:-1]:
            #     stage += 1
            #     optimizer = adjust_learning_rate()

            if is_best:
                no_improved_times = 0
            else:
                no_improved_times += 1

            print('stage: %d, no_improved_times: %d' % (stage, no_improved_times))

            if no_improved_times >= patience:
                stage += 1
                optimizer = adjust_learning_rate()

            state = {
                'epoch': epoch + 1,
                'arch': pre_model,
                'state_dict': model.state_dict(),
                'best_score': best_score,
                'no_improved_times': no_improved_times,
                'stage': stage,
                'lr': lr,
            }
            save_checkpoint(state, is_best)

            # if (epoch + 1) in np.cumsum(stage_epochs)[:-1]:
            if no_improved_times >= patience:
                no_improved_times = 0
                model.load_state_dict(torch.load('./model/%s/model_best.pth.tar' % file_name)['state_dict'])
                print('Step into next stage')
                with open('./result/%s.txt' % file_name, 'a') as acc_file:
                    acc_file.write('---------------------Step into next stage---------------------\n')