Paper Reading: ShuffleNet: An effective Convolutional Neural Network for Mobile Devices

ShuffleNet v1
ShuffleNet: An effective Convolutional Neural Network for Mobile Devices

Section 1 & 2

Group convolution and separable convolution

Group convolution was first introduced in AlexNet, which was separate the network into 2, for example, parts and train them on 2 independent GPUs. The author propose group convolution here, stands on the fact that too much restrictions between channels may case a loss of precision during computation.
Separable convolution was proposed in Xception, and that’s a different method of convolution. It generates the same shape of feature maps with less parameter than a standard convolution, thus can reduce the computation cost.

Other

Some introductions about shuffle operation and acceleration methods.

Section 3

Shortcomings of pointwise convolution

In depthwise separable convolution, the novel idea of pointwose convolution(also called 1 × 1 1 \times 1 1×1 convolution) do simplify the convolution operation, but the majority of computation resources are occupied as well. Thus, in tiny networks, the expensive pointwise convolution result in limited number of channels to meet the complexity constraint, which may significantly damage the accuracy.

Channel shuffle for group convolution

Channel shuffle for group convolution
A group convolution requires operations in the same group only depends on the corresponding input channels. Which means no cross talk between different groups. Just as shown in (a), which shows 2 stacked group convolutions with 3 groups each.
To achieve higher performance, however, the author use channel shuffle between the 2 stacked group convolutions. Figure (b) and © are equivalent implementations for channel shuffle.

ShuffleNet Unit

ShuffleNet Unit
The ShuffleNet units are constructed base on the residual blocks that first proposed in ResNet. The initial structure is shown in (a), and the replacement of channel shuffle in (b) makes this unit more powerful to avoid additional layers. This,the advantage of this structure is that it can achieve a good result in fewer layers.
Figure (c) adds strides in the convolution operation.

Section 4

From the shown results, I deduce that the group operation makes a little bit contribution to the error rate. But the shuffle operation do helps.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值