李宏毅作业七其一 Network Compression (Architecuture Design)

最新推荐文章于 2024-05-10 02:33:17 发布

闲看庭前雪

最新推荐文章于 2024-05-10 02:33:17 发布

阅读量685

点赞数 2

分类专栏：李宏毅模型优化文章标签：人工智能深度学习机器学习

本文链接：https://blog.csdn.net/qq_43605708/article/details/109502693

版权

李宏毅同时被 2 个专栏收录

11 篇文章 6 订阅

订阅专栏

模型优化

4 篇文章 1 订阅

订阅专栏

Network Compression ——Architecuture Design

前言
一、Architecture Design``
- 1.基础知识
- 2.代码细节
二、代码示例
小结

前言

整个作业七讲的是网络模型的压缩，使整个模型不再臃肿。减少计算量的同时，保持原有精度，甚至超越之前。算力是受到物理因素限制的，如何提高算力的利用率是一件值得探索的事。本文通过学习李宏毅作业，给出自己的理解和相关代码的注释。

李宏毅作业给出了四种方法。
知识蒸馏 Knowledge Distillation
网络修剪 Network Pruning
参数少量化 Architecture Design
参数量化 Weight Quantization

一、Architecture Design``

1.基础知识

蓝色表示上下层通道间的关系，绿色表示的是感受野的扩张。（图片引自arxiv:1810.04231）

（1）首先理解一般卷积的的感受野和通道关系。（a),给出了一般卷积感受野是扩增的，通道方向是进行卷积后相加。weight矩阵的连接方式和全连接层一样。

（2）DW(Depthwise Convolution Layer)+PW(Pointwise Convolution Layer）
先看蓝色，首先feature map分别经过一个filter处理，然后用PW(Pointwise Convolution Layer）进行逐点卷积，把DW后的feature map 上的所有单个像素piexl卷积起来，相当于一个piexl的全连接层。

（3）GC(Group Convolution Layer)+PWG
首先在要知道standard,DW,GC它们之间的关系。从蓝色观察，也就是从通道的角度观察，GC介于standard和Dw之间，若分的组等feature数，那就是DW，若组数为1，就是standard。
所以，就是将特征分组后，按组通过卷积层，然后再连接。
就是把feature map分組，讓他們自己過Convolution Layer後再重新Concat起來。
在这里插入图片描述

2.代码细节

在这里插入图片描述

结合这张图片，代码细节一目了然。

# 一般的Convolution, weight大小 = in_chs * out_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding)
 
# Group Convolution, Group數目可以自行控制，表示要分成幾群。其中in_chs和out_chs必須要可以被groups整除。(不然沒辦法分群。)
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding, groups=groups)
 
# Depthwise Convolution, 輸入chs=輸出chs=Groups數目, weight大小 = in_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs=in_chs, kernel_size, stride, padding, groups=in_chs)
 
# Pointwise Convolution, 也就是1 by 1 convolution, weight大小 = in_chs * out_chs
nn.Conv2d(in_chs, out_chs, 1)

二、代码示例

代码里的注释说的很详细了。
ReLU网络变形之一ReLU6。
Relu6（抑制其最大值）：
公式：
在这里插入图片描述
即当 x > 6时，其导数也为0。

目的：
主要是为了在移动端float16的低精度的时候，也能有很好的数值分辨率，如果对ReLu的输出值不加限制，那么输出范围就是0到正无穷，而低精度的float16无法精确描述其数值，带来精度损失。

import torch.nn as nn
import torch.nn.functional as F
import torch

class StudentNet(nn.Module):
    '''
      在這個Net裡面，我們會使用Depthwise & Pointwise Convolution Layer來疊model。
      你會發現，將原本的Convolution Layer換成Dw & Pw後，Accuracy通常不會降很多。

      另外，取名為StudentNet是因為這個Model等會要做Knowledge Distillation。
    '''

    def __init__(self, base=16, width_mult=1):
        '''
          Args:
            base: 這個model一開始的ch數量，每過一層都會*2，直到base*16為止。
            width_mult: 為了之後的Network Pruning使用，在base*8 chs的Layer上會 * width_mult代表剪枝後的ch數量。        
        '''
        super(StudentNet, self).__init__()
        multiplier = [1, 2, 4, 8, 16, 16, 16, 16]

        # bandwidth: 每一層Layer所使用的ch數量
        bandwidth = [ base * m for m in multiplier]

        # 我們只Pruning第三層以後的Layer
        for i in range(3, 7):
            bandwidth[i] = int(bandwidth[i] * width_mult)

        self.cnn = nn.Sequential(
            # 第一層我們通常不會拆解Convolution Layer。
            nn.Sequential(
                nn.Conv2d(3, bandwidth[0], 3, 1, 1),#bandwidth[0]=1*16
                nn.BatchNorm2d(bandwidth[0]),
                nn.ReLU6(),
                nn.MaxPool2d(2, 2, 0),
            ),
            # 接下來每一個Sequential Block都一樣，所以我們只講一個Block
            nn.Sequential(
                # Depthwise Convolution卷积层（x,x,x)
                nn.Conv2d(bandwidth[0], bandwidth[0], 3, 1, 1, groups=bandwidth[0]),
                # Batch Normalization
                nn.BatchNorm2d(bandwidth[0]),
                # ReLU6 是限制Neuron最小只會到0，最大只會到6。 MobileNet系列都是使用ReLU6。
                # 使用ReLU6的原因是因為如果數字太大，會不好壓到float16 / or further qunatization，因此才給個限制。
                nn.ReLU6(),
                # Pointwise Convolution
                nn.Conv2d(bandwidth[0], bandwidth[1], 1),
                # 過完Pointwise Convolution不需要再做ReLU，經驗上Pointwise + ReLU效果都會變差。
                nn.MaxPool2d(2, 2, 0),
                # 每過完一個Block就Down Sampling
            ),

            nn.Sequential(
            #DW
                nn.Conv2d(bandwidth[1], bandwidth[1], 3, 1, 1, groups=bandwidth[1]),
                nn.BatchNorm2d(bandwidth[1]),
                nn.ReLU6(),
                nn.Conv2d(bandwidth[1], bandwidth[2], 1),
                nn.MaxPool2d(2, 2, 0),
            ),

            nn.Sequential(
                nn.Conv2d(bandwidth[2], bandwidth[2], 3, 1, 1, groups=bandwidth[2]),
                nn.BatchNorm2d(bandwidth[2]),
                nn.ReLU6(),
                nn.Conv2d(bandwidth[2], bandwidth[3], 1),
                nn.MaxPool2d(2, 2, 0),
            ),

            # 到這邊為止因為圖片已經被Down Sample很多次了，所以就不做MaxPool
            nn.Sequential(
                nn.Conv2d(bandwidth[3], bandwidth[3], 3, 1, 1, groups=bandwidth[3]),
                nn.BatchNorm2d(bandwidth[3]),
                nn.ReLU6(),
                nn.Conv2d(bandwidth[3], bandwidth[4], 1),
            ),

            nn.Sequential(
                nn.Conv2d(bandwidth[4], bandwidth[4], 3, 1, 1, groups=bandwidth[4]),
                nn.BatchNorm2d(bandwidth[4]),
                nn.ReLU6(),
                nn.Conv2d(bandwidth[4], bandwidth[5], 1),
            ),

            nn.Sequential(
                nn.Conv2d(bandwidth[5], bandwidth[5], 3, 1, 1, groups=bandwidth[5]),
                nn.BatchNorm2d(bandwidth[5]),
                nn.ReLU6(),
                nn.Conv2d(bandwidth[5], bandwidth[6], 1),
            ),

            nn.Sequential(
                nn.Conv2d(bandwidth[6], bandwidth[6], 3, 1, 1, groups=bandwidth[6]),
                nn.BatchNorm2d(bandwidth[6]),
                nn.ReLU6(),
                nn.Conv2d(bandwidth[6], bandwidth[7], 1),
            ),

            # 這邊我們採用Global Average Pooling。
            # 如果輸入圖片大小不一樣的話，就會因為Global Average Pooling壓成一樣的形狀，這樣子接下來做FC就不會對不起來。
            nn.AdaptiveAvgPool2d((1, 1)),
        )
        self.fc = nn.Sequential(
            # 這邊我們直接Project到11維輸出答案。
            nn.Linear(bandwidth[7], 11),
        )

    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)

小结

网络压缩方法最好是能带入模型自己跑一下，后期如果有时间，我将会带入数据和模型看一看具体效果如何。

闲看庭前雪

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
2
评论
李宏毅作业七其一 Network Compression (Architecuture Design)

Network Compression ——Architecuture Design前言一、Architecture Design``1.基础知识2.代码细节二、代码示例小结前言整个作业七讲的是网络模型的压缩，使整个模型不再臃肿。减少计算量的同时，保持原有精度，甚至超越之前。算力是受到物理因素限制的，如何提高算力的利用率是一件值得探索的事。本文通过学习李宏毅作业，给出自己的理解和相关代码的注释。李宏毅作业给出了四种方法。知识蒸馏 Knowledge Distillation网络修剪 Networ
复制链接

扫一扫

专栏目录