ShuffleNet V1 网络结构的原理与 Tensorflow2.0 实现

本文链接：https://blog.csdn.net/qq_36758914/article/details/106967780

文章目录

介绍
Group convolution
Channel shuffle
ShuffleNet Unit
ShuffleNet 总体结构
代码实现

介绍

在论文 ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices 中，作者提出了 Group convolution 帮助降低计算复杂度；但是使用 Group convolution 会有副作用，故在此基础上，论文提出 Channel shuffle 帮助信息流通。基于这两种技术，论文构建了一个名为 ShuffleNet 的高效架构，相比于其他先进模型，对于给定的计算复杂度预算，ShuffleNet 允许使用更多的特征映射通道，在小型网络上有助于编码更多信息。

Group convolution

Group convolution 的实质就是将标准卷积过程分为 $g$ 个独立的组，分别计算，最后再合并起来。即：

假设输入的形状为 $H\times W \times C$ ，用 $k$ 个 $h\times w$ 的卷积核对其进行卷积操作；
把输入分为 $g$ 组，每组形状为 $H\times W \times (C/g)$ （假设可整除）；
把卷积核也分为 $g$ 组，每组为 $k / g$ （假设可整除）个 $h\times w$ 卷积核；
按顺序，每组的输入和该组内的卷积核分别做标准卷积操作，输出 $g$ 组形状为 $H'\times W'\times (k/g)$ ；
将这 $g$ 组特征合并起来，得到最终形状为 $H'\times W'\times k$ 的特征。

下图（来源于白裳）是分两组时的举例：
在这里插入图片描述
使用 Group convolution 的好处是大大减少了网络的参数量：

对于标准卷积，假设输入的形状为 $H\times W \times C$ ，用 $k$ 个 $h\times w$ 的卷积核对其进行卷积操作，其参数量为：
$h\times w\times C\times k$
对于 Group convolution，输入和卷积核不变，其参数量为：
$h\times w\times C/g\times k/g \times g=h\times w\times C\times k/g$
这比原来缩小了 $g$ 倍。

Channel shuffle

因为 Group convolution 将通道分开进行卷积操作，所以会导致通道具有局部性，这样对模型的泛化能力有所损失，影响模型的准确率。因此我们使用 Channel shuffle 对 Group convolution 的卷积结果在通道维度进行打乱，即将输出的通道进行重新分组。具体的实现如下图（c）所示。
在这里插入图片描述
其具体步骤为：

假设输入的形状为 $h\times w\times c$ ；
假设将输入层分成 $g$ 组，则每组的通道数是 $c / g$ ；
将通道那个维度拆分为两个维度，即将形状变为 $h\times w\times g\times c/g$ ；
对最后那两个拆分出来的维度进行转置，得到形状为 $h\times w\times c/g\times g$ ；
对最后两个维度重新 reshape 成一个维度，重新得到形状为 $h\times w\times c$ 。

ShuffleNet Unit

下面三幅图中：

图 a) 是标准的 bottleneck unit，和 MobileNet V2 中的一样，其中，第二个 1x1 Conv 的目的是把输出的通道数恢复到和输入一样，这样才能进行相加操作，即 shortcut path。
图 b) 是一个 ShuffleNet Unit，它与图 a) 的区别有：
- 1、把图 a) 中前后两个 1x1 Conv 采用了 Group convolution 的形式，称之为 Pointwise group convolution；
- 2、Channel shuffle 被应用在第一个 Pointwise group convolution 之后。
图 c) 也是一个 ShuffleNet Unit，它跟图 b) 的区别是：
- 1、中间一个 3x3 的 Depthwise convolution 的步长是 2 而不是 1；
- 2、在支路上加一个步长为 2 的 3x3 的平均池化层；
- 3、把相加操作变成合并操作，这样可以在计算成本增加不多的情况下增多通道数。

在这里插入图片描述

ShuffleNet 总体结构

在这里插入图片描述
在三个 stage (stage 2, 3,4) 中，都是先使用步长为 2 的 ShuffleNet Unit，又堆叠若干个个步长为 1 的 ShuffleNet Unit。

另外，某一个 stage 的输出通道数是上一个 stage 的输出通道数的两倍，且在每个 Unit 里，设置 Bottleneck 的通道数是输出通道数的四分之一。

【注】在 ShuffleNet 里，Depthwise convolution 只用在 Bottleneck 里，这是因为虽然 Depthwise convolution 在理论上并不复杂，但它很难有效地被应用到低功率移动设备上。

除此之外，我们使用 s 来表示某种 ShuffleNet 中卷积核的个数，如下图所示：
在这里插入图片描述
ShuffleNet sx 都是基于 ShuffleNet 1x 来说的，表示卷积核的个数（通道数）是 ShuffleNet 1× 中的 s 倍。

代码实现

import tensorflow as tf

def channel_shuffle(inputs, num_groups):
    
    n, h, w, c = inputs.shape
    x_reshaped = tf.reshape(inputs, [-1, h, w, num_groups, c // num_groups])
    x_transposed = tf.transpose(x_reshaped, [0, 1, 2, 4, 3])
    output = tf.reshape(x_transposed, [-1, h, w, c])
    
    return output

def group_conv(inputs, filters, kernel, strides, num_groups):
    
    conv_side_layers_tmp = tf.split(inputs, num_groups ,axis=3)
    conv_side_layers = []
    for layer in conv_side_layers_tmp:
        conv_side_layers.append(tf.keras.layers.Conv2D(filters//num_groups, kernel, strides, padding='same')(layer))
    x = tf.concat(conv_side_layers, axis=-1)
    
    return x

def conv(inputs, filters, kernel_size, stride=1, activation=False):
    
    x = tf.keras.layers.Conv2D(filters, kernel_size, stride, padding='same')(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    if activation:
        x = tf.keras.layers.Activation('relu')(x)
        
    return x

def depthwise_conv_bn(inputs, kernel_size, stride=1):
    
    x = tf.keras.layers.DepthwiseConv2D(kernel_size=kernel_size, 
                                        strides=stride, 
                                        padding='same')(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
        
    return x

def ShuffleNetUnitA(inputs, num_groups):
    
    in_channels = inputs.shape[-1]
    out_channels = in_channels
    bottleneck_channels = out_channels // 4
    
    x = group_conv(inputs, bottleneck_channels, kernel=1, strides=1, num_groups=num_groups)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation('relu')(x)
    x = channel_shuffle(x, num_groups)
    x = depthwise_conv_bn(x, kernel_size=3, stride=1)
    x = group_conv(x, out_channels, kernel=1, strides=1, num_groups=num_groups)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.add([inputs, x])
    x = tf.keras.layers.Activation('relu')(x)
    
    return x

def ShuffleNetUnitB(inputs, out_channels, num_groups):
    
    in_channels = inputs.shape[-1]
    out_channels -= in_channels
    bottleneck_channels = out_channels // 4
    
    x = group_conv(inputs, bottleneck_channels, kernel=1, strides=1, num_groups=num_groups)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation('relu')(x)
    x = channel_shuffle(x, num_groups)
    x = depthwise_conv_bn(x, kernel_size=3, stride=2)
    x = group_conv(x, out_channels, kernel=1, strides=1, num_groups=num_groups)
    x = tf.keras.layers.BatchNormalization()(x)
    y = tf.keras.layers.AvgPool2D(pool_size=3, strides=2, padding='same')(inputs)
    x = tf.concat([y, x], axis=-1)
    x = tf.keras.layers.Activation('relu')(x)
    
    return x

def stage(inputs, out_channels, num_groups, n):
    
    x = ShuffleNetUnitB(inputs, out_channels, num_groups)
    
    for _ in range(n):
        x = ShuffleNetUnitA(x, num_groups)
        
    return x

def ShuffleNet(inputs, first_stage_channels, num_groups):
    x = tf.keras.layers.Conv2D(filters=24, 
                               kernel_size=3, 
                               strides=2, 
                               padding='same')(inputs)
    x = tf.keras.layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(x)
    
    x = stage(x, first_stage_channels, num_groups, n=3)
    x = stage(x, first_stage_channels*2, num_groups, n=7)
    x = stage(x, first_stage_channels*4, num_groups, n=3)
    
    x = tf.keras.layers.GlobalAveragePooling2D()(x)
    x = tf.keras.layers.Dense(1000)(x)
    
    return x

inputs = np.zeros((1, 224, 224, 3), np.float32)
ShuffleNet(inputs, 144, 1).shape