Shufflev1 -- Shufflev2

ShuffleNetv1
是Face++提出的一种轻量化网络结构,主要思路是使用Group convolution和Channel shuffle改进ResNet,可以看作是ResNet的压缩版本。
Group convolution 分组卷积,最早在AlexNet中出现,由于当时的硬件资源有限,训练AlexNet时卷积操作不能全部放在同一个GPU处理,因此作者把feature maps分给多个GPU分别进行处理,最后把多个GPU的结果进行融合。分组卷积如下图。
在这里插入图片描述
图中将输入数据在通道上分成了2组,卷积核在通道C1和C2上都分组。即每组中卷积核的深度是(C1/2),此时每组的卷积核的个数是(C2/2)个。然后用每组的卷积核同它们对应组内的输入数据卷积,再用concatenate的方式组合起来,最终的输出数据的通道仍旧是C2。也就是说,分组数g决定以后,那么我们将并行的运算g个相同的卷积过程,每组输入数据为H1×W1×C1/g,卷积核大小为h1×w1×C1/g,一共有C2/g个,输出数据为H2×W2×C2/g。参数变为原来的1/g
比如(?,?,?,128) --> (?,?,?,256),卷积核大小为3x3:
普通卷积需要的参数为:3x3x128x256=294,912
分组卷积需要的参数为(group=8时,每个group的input channel和output channel分别为16和32):8x3x3x(128/8)x(256/8)=36,864
但是组卷积有个问题(如下图a),那就是不同组的通道间没有信息交换。channel shuffle的作用就是打乱通道顺序,让不同的组间进行信息交换。下图b和c是两种等价的画法。
在这里插入图片描述
最终作者设计出了自己的building block:ShuffleNet block。如下图(b和c):
在这里插入图片描述
部分程序:

def grouped_conv2d(name, x, w=None, num_filters=16, kernel_size=(3, 3), padding='SAME', stride=(1, 1),
                   initializer=tf.contrib.layers.xavier_initializer(), num_groups=1, l2_strength=0.0, bias=0.0,
                   activation=None, batchnorm_enabled=False, dropout_keep_prob=-1,
                   is_training=True):
    with tf.variable_scope(name) as scope:
        sz = x.get_shape()[3].value // num_groups
        conv_side_layers = [conv2d(name + "_" + str(i), x[:, :, :, i * sz:i * sz + sz], w, num_filters // num_groups, kernel_size,
                   padding, stride, initializer, l2_strength, bias, activation=None,batchnorm_enabled=False, max_pool_enabled=False, dropout_keep_prob=dropout_keep_prob, is_training=is_training) for i in range(num_groups)]
        conv_g = tf.concat(conv_side_layers, axis=-1)
    if batchnorm_enabled:
        conv_o_bn = tf.layers.batch_normalization(conv_g, training=is_training, epsilon=1e-5)
        if not activation:
            conv_a = conv_o_bn
        else:
            conv_a = activation(conv_o_bn)
    else:
        if not activation:
            conv_a = conv_g
        else:
            conv_a = activation(conv_g)
    return conv_a

def __depthwise_conv2d_p(name, x, w=None, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), l2_strength=0.0, bias=0.0):
    with tf.variable_scope(name):
        stride = [1, stride[0], stride[1], 1]
        kernel_shape = [kernel_size[0], kernel_size[1], x.shape[-1], 1]
    with tf.name_scope('layer_weights'):
        if w is None:
            w = __variable_with_weight_decay(kernel_shape, initializer, l2_strength)
        __variable_summaries(w)
    with tf.name_scope('layer_biases'):
        if isinstance(bias, float):
            bias = tf.get_variable('biases', [x.shape[-1]], initializer=tf.constant_initializer(bias))
        __variable_summaries(bias)
    with tf.name_scope('layer_conv2d'):
        conv = tf.nn.depthwise_conv2d(x, w, stride, padding)
        out = tf.nn.bias_add(conv, bias)
return out

def grouped_conv2d(name, x, w=None, num_filters=16, kernel_size=(3, 3), padding='SAME', stride=(1, 1),initializer=tf.contrib.layers.xavier_initializer(), num_groups=1, l2_strength=0.0, bias=0.0,activation=None, batchnorm_enabled=False, dropout_keep_prob=-1, is_training=True):
    with tf.variable_scope(name) as scope:
        sz = x.get_shape()[3].value // num_groups
        conv_side_layers = [conv2d(name + "_" + str(i), x[:, :, :, i * sz:i * sz + sz], w, num_filters // num_groups, kernel_size,
                   padding,stride,initializer,l2_strength, bias, activation=None,batchnorm_enabled=False, max_pool_enabled=False, dropout_keep_prob=dropout_keep_prob,is_training=is_training) for i inrange(num_groups)]
        conv_g = tf.concat(conv_side_layers, axis=-1)
    if batchnorm_enabled:
        conv_o_bn = tf.layers.batch_normalization(conv_g, training=is_training, epsilon=1e-5)
        if not activation:
            conv_a = conv_o_bn
        else:
            conv_a = activation(conv_o_bn)
    else:
        if not activation:
            conv_a = conv_g
        else:
            conv_a = activation(conv_g)
    return conv_a

def shufflenet_unit(name, x, w=None, num_groups=1, group_conv_bottleneck=True, num_filters=16, stride=(1, 1), l2_strength=0.0, bias=0.0, batchnorm_enabled=True, is_training=True, fusion='add'):
    activation = tf.nn.relu
    with tf.variable_scope(name) as scope:
        residual = x
        bottleneck_filters = (num_filters // 4) if fusion == 'add' else (num_filters - residual.get_shape()[3].value) // 4
    if group_conv_bottleneck:
        bottleneck = grouped_conv2d('Gbottleneck', x=x, w=None, num_filters=bottleneck_filters, kernel_size=(1, 1),padding='VALID',num_groups=num_groups, l2_strength=l2_strength,  bias=bias,activation=activation,batchnorm_enabled=batchnorm_enabled, is_training=is_training)
        shuffled = channel_shuffle('channel_shuffle', bottleneck, num_groups)
    else:
        bottleneck = conv2d('bottleneck', x=x, w=None, num_filters=bottleneck_filters, kernel_size=(1, 1),padding='VALID', l2_strength=l2_strength, bias=bias, activation=activation, batchnorm_enabled=batchnorm_enabled, is_training=is_training)
        shuffled = bottleneck
    padded = tf.pad(shuffled, [[0, 0], [1, 1], [1, 1], [0, 0]], "CONSTANT")
    depthwise = depthwise_conv2d('depthwise', x=padded, w=None, stride=stride, l2_strength=l2_strength,padding='VALID', bias=bias,activation=None, batchnorm_enabled=batchnorm_enabled, is_training=is_training)
    if stride == (2, 2):
        residual_pooled = avg_pool_2d(residual, size=(3, 3), stride=stride, padding='SAME')
    else:
        residual_pooled = residual
    if fusion == 'concat':
        group_conv1x1 = grouped_conv2d('Gconv1x1', x=depthwise, w=None, num_filters=num_filters - residual.get_shape()[3].value,
                                       kernel_size=(1, 1), padding='VALID',num_groups=num_groups, l2_strength=l2_strength, bias=bias,
                                       activation=None, batchnorm_enabled=batchnorm_enabled, is_training=is_training)
        return activation(tf.concat([residual_pooled, group_conv1x1], axis=-1))
    elif fusion == 'add':
        group_conv1x1 = grouped_conv2d('Gconv1x1', x=depthwise, w=None,num_filters=num_filters,
                                       kernel_size=(1, 1),padding='VALID',num_groups=num_groups, l2_strength=l2_strength, bias=bias,
                                       activation=None,batchnorm_enabled=batchnorm_enabled, is_training=is_training)
        residual_match = residual_pooled
        # This is used if the number of filters of the residual block is different from that
        # of the group convolution.
        if num_filters != residual_pooled.get_shape()[3].value:
            residual_match = conv2d('residual_match', x=residual_pooled, w=None, num_filters=num_filters,kernel_size=(1, 1),padding='VALID', l2_strength=l2_strength, bias=bias, activation=None,batchnorm_enabled=batchnorm_enabled, is_training=is_training)
        return activation(group_conv1x1 + residual_match)
    else:
        raise ValueError("Specify whether the fusion is \'concat\' or \'add\'")
def channel_shuffle(name, x, num_groups):
    with tf.variable_scope(name) as scope:
        n, h, w, c = x.shape.as_list()
        x_reshaped = tf.reshape(x, [-1, h, w, num_groups, c // num_groups])
        x_transposed = tf.transpose(x_reshaped, [0, 1, 2, 4, 3])
        output = tf.reshape(x_transposed, [-1, h, w, c])
        return output
def __stage(self, x, stage=2, repeat=3):
    if 2 <= stage <= 4:
        stage_layer = shufflenet_unit('stage' + str(stage) + '_0', x=x, w=None,num_groups=self.args.num_groups,
                                      group_conv_bottleneck=not (stage == 2),num_filters=self.output_channels[str(self.args.num_groups)][stage - 2],
                                      stride=(2, 2),fusion='concat', l2_strength=self.args.l2_strength,bias=self.args.bias,
                                      batchnorm_enabled=self.args.batchnorm_enabled,is_training=self.is_training)
        for i in range(1, repeat + 1):
            stage_layer = shufflenet_unit('stage' + str(stage) + '_' + str(i),x=stage_layer, w=None,num_groups=self.args.num_groups,
                                          group_conv_bottleneck=True,num_filters=self.output_channels[str(self.args.num_groups)][stage - 2],
                                          stride=(1, 1),fusion='add',l2_strength=self.args.l2_strength,bias=self.args.bias,batchnorm_enabled=self.args.batchnorm_enabled,is_training=self.is_training)
        return stage_layer
def __build(self):
    with tf.name_scope('Preprocessing'):
        red, green, blue = tf.split(self.X, num_or_size_splits=3, axis=3)
        preprocessed_input = tf.concat([ tf.subtract(blue, ShuffleNet.MEAN[0]) * ShuffleNet.NORMALIZER,  ##先减法,参数是在上边定义好的。
            tf.subtract(green, ShuffleNet.MEAN[1]) * ShuffleNet.NORMALIZER,
            tf.subtract(red, ShuffleNet.MEAN[2]) * ShuffleNet.NORMALIZER, ], 3)
    x_padded = tf.pad(preprocessed_input, [[0, 0], [1, 1], [1, 1], [0, 0]], "CONSTANT")
    conv1 = conv2d('conv1', x=x_padded, w=None, num_filters=self.output_channels['conv1'], kernel_size=(3, 3), stride=(2, 2),l2_strength=self.args.l2_strength, bias=self.args.bias, batchnorm_enabled=self.args.batchnorm_enabled, is_training=self.is_training, activation=tf.nn.relu, padding='VALID')
    padded = tf.pad(conv1, [[0, 0], [0, 1], [0, 1], [0, 0]], "CONSTANT")
    max_pool = max_pool_2d(padded, size=(3, 3), stride=(2, 2), name='max_pool')
    stage2 = self.__stage(max_pool, stage=2, repeat=3)
    stage3 = self.__stage(stage2, stage=3, repeat=7)
    stage4 = self.__stage(stage3, stage=4, repeat=3)

    global_pool = avg_pool_2d(stage4, size=(7, 7), stride=(1, 1), name='global_pool', padding='VALID')
    logits_unflattened = conv2d('fc', global_pool, w=None, num_filters=self.args.num_classes, kernel_size=(1, 1),l2_strength=self.args.l2_strength,bias=self.args.bias, is_training=self.is_training)
    self.logits = flatten(logits_unflattened)
    self.__init_output()

ShuffleNetv2
作者通过分析ShuffleNet v1和MobileNet v2,在两个不同的硬件平台(GPU 和 ARM)上验证其性能然后推导出高效网络设计的四个准则(它们扬弃了仅考虑 FLOPs 所带来的局限性):

Guideline 1(G1): 输入通道数与输出通道数保持相等可以最小化内存访问成本(memory access cost,MAC)。
Guideline 2(G2): 分组卷积中使用过多的分组数会增加内存访问成本(MAC)
Guideline 3(G3): 网络结构太复杂(分支和基本单元过多)会降低网络的并行程度
Guideline 4(G4): Element-wise 的操作消耗也不可忽略(包括ReLU,Tensor的相加,偏置的相加等等操作)
ShuffleNet V1严重依赖组卷积(违反 G2)和瓶颈形态的构造块(违反 G1)。MobileNet V2使用一种倒置的瓶颈结构,违反了 G1。它在「厚」特征图上使用了深度卷积和 ReLU 激活函数,违反了 G4。自动生成结构的碎片化程度很高,违反了 G3。
作者根据这些准则设计了一种新的网络架构。它是 ShuffleNet V1 的改进版,因此被称为 ShuffleNet V2。
下图是ShuffleNet V1和ShuffleNet V2基础单元
在这里插入图片描述
仔细观察©,(d)对网络的改进我们发现了以下几点:
在©中ShuffleNet v2使用了一个通道分割(Channel Split)操作。即将 通道分组,一般情况下分成2组 。
分割之后的两个分支。为了满足G3, 左侧是一个恒等映射。右侧是一个输入通道数和输出通道数均相同的卷积,满足G1。
在右侧的卷积中, 1x1 卷积并没有使用分组卷积,部分是因为分割操作已经产生了两个组。
最后在合并的时候均是使用拼接(Concat)操作,拼接后整个模块的通道数量保持不变。使用concat 代替原来的 elementy-wise add,并且后面不加 ReLU,满足G4。
拼接后接一个和ShuffleNet v1中一样的Channel Shuffle操作。在堆叠ShuffleNet v2的时候,通道拼接、通道洗牌和通道分割可以合并成1个element-wise操作,也是为了满足G4。

参考:https://blog.csdn.net/yuanlulu/article/details/84867353
参考:https://blog.csdn.net/u010801994/article/details/85005979
参考:https://github.com/MG2033/ShuffleNet

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值