ShuffleNet v1
ShuffleNet: An effective Convolutional Neural Network for Mobile Devices
Contents
Section 1 & 2
Group convolution and separable convolution
Group convolution was first introduced in AlexNet, which was separate the network into 2, for example, parts and train them on 2 independent GPUs. The author propose group convolution here, stands on the fact that too much restrictions between channels may case a loss of precision during computation.
Separable convolution was proposed in Xception, and that’s a different method of convolution. It generates the same shape of feature maps with less parameter than a standard convolution, thus can reduce the computation cost.
Other
Some introductions about shuffle operation and acceleration methods.
Section 3
Shortcomings of pointwise convolution
In depthwise separable convolution, the novel idea of pointwose convolution(also called 1 × 1 1 \times 1 1×1 convolution) do simplify the convolution operation, but the majority of computation resources are occupied as well. Thus, in tiny networks, the expensive pointwise convolution result in limited number of channels to meet the complexity constraint, which may significantly damage the accuracy.
Channel shuffle for group convolution
A group convolution requires operations in the same group only depends on the corresponding input channels. Which means no cross talk between different groups. Just as shown in (a), which shows 2 stacked group convolutions with 3 groups each.
To achieve higher performance, however, the author use channel shuffle between the 2 stacked group convolutions. Figure (b) and © are equivalent implementations for channel shuffle.
ShuffleNet Unit
The ShuffleNet units are constructed base on the residual blocks that first proposed in ResNet. The initial structure is shown in (a), and the replacement of channel shuffle in (b) makes this unit more powerful to avoid additional layers. This,the advantage of this structure is that it can achieve a good result in fewer layers.
Figure (c) adds strides in the convolution operation.
Section 4
From the shown results, I deduce that the group operation makes a little bit contribution to the error rate. But the shuffle operation do helps.