Intuition
- Inception series
- Conv maps cross-channel correlation and spatial correlation at the same time.
- Inception module makes this process easier and more efficient by explicitly factoring it into a series of operations that would independently look at cross-channel correlations and at spatial correlations.
- 1x1 Conv -> cross-channel correlation; 3x3 & 5x5 Conv -> spatial correlation.
- An extreme version of this separation is to entirely decouple the cross-channel and spatial operations, naming Xception.
Xception
- Xception module:
- First use 1x1 Conv
- Conduct depthwise separable convolution (DSC): each feature-map have different 3x3 Conv, then concatenate the result of each Conv.
- Advantages: Efficient parameter usage
- Whole model
Experiment
- Dataset: JET (internal Google dataset), ImageNet, FastEval14k.
- Result
- Xception converges faster than Inception V3 and gets higher accuracy.
- 21.0% top-1, 5.5% top-5 error on ImageNet.
- Better with residual connection.
- Worse with non-linear in between the 1x1 and DSC.